A research-grade algorithm recommendation system that automatically selects the optimal algorithm for sorting, array searching, graph searching, string searching, and tree searching problems. The system combines algorithmic theory (correctness constraints) with machine learning (data-driven selection) to provide intelligent, explainable recommendations.
This system addresses the algorithm selection problem: given a problem instance, which algorithm should be used for optimal performance?
Traditional approaches either:
- Use rule-based systems (rigid, don't adapt to data)
- Use pure ML (may violate correctness constraints)
Our approach is a hybrid system that:
- Enforces strict correctness constraints (e.g., never recommends Binary Search for unsorted arrays)
- Uses ML for optimization among valid algorithms
- Provides human-readable explanations for all recommendations
- Includes automatic structure inference to detect input type
| Algorithm | Best Case | Average | Worst | Space | When to Use |
|---|---|---|---|---|---|
| Insertion Sort | O(n) | O(nΒ²) | O(nΒ²) | O(1) | Small/nearly sorted arrays |
| Selection Sort | O(nΒ²) | O(nΒ²) | O(nΒ²) | O(1) | Very small arrays |
| Merge Sort | O(n log n) | O(n log n) | O(n log n) | O(n) | Large arrays, stability needed |
| Quick Sort | O(n log n) | O(n log n) | O(nΒ²) | O(log n) | General purpose |
| Heap Sort | O(n log n) | O(n log n) | O(n log n) | O(1) | Guaranteed performance + in-place |
Features Extracted:
- Input size (log-normalized)
- Sortedness measure (inversions ratio)
- Duplicate ratio
- Distribution characteristics
| Algorithm | Complexity | Constraint | When to Use |
|---|---|---|---|
| Linear Search | O(n) | None | Unsorted arrays, small arrays |
| Binary Search | O(log n) | Sorted only | Large sorted arrays |
| Jump Search | O(βn) | Sorted only | Sorted arrays, forward-only access |
| Exponential Search | O(log n) | Sorted + Uniform | Large sorted uniform arrays |
Correctness Constraints (STRICT):
- β Binary/Jump/Exponential search NEVER recommended for unsorted arrays
- β Exponential search NEVER recommended for non-uniform distributions
| Algorithm | Complexity | Constraint | When to Use |
|---|---|---|---|
| BFS | O(V + E) | Unweighted only | Shortest path in unweighted graphs |
| Dijkstra | O((V+E) log V) | Non-negative weights | General weighted graphs |
| Bellman-Ford | O(VE) | Handles negatives | Graphs with negative weights |
| A* | O(E) best | Requires heuristic | When good heuristic available |
Correctness Constraints (STRICT):
- β BFS NEVER recommended for weighted graphs
- β Dijkstra NEVER recommended when negative weights exist
- β A* NEVER recommended without heuristic
| Algorithm | Complexity | Constraint | When to Use |
|---|---|---|---|
| Naive String Search | O(nm) | None | Single pattern, short text |
| Trie Search | O(m) per query | Multiple patterns OR prefix queries | Multiple patterns, autocomplete |
Correctness Constraints (STRICT):
- β Trie search NEVER recommended for single exact pattern (preprocessing overhead not justified)
| Algorithm | Complexity | Constraint | When to Use |
|---|---|---|---|
| BST Search | O(h) | None | Any BST, unknown balance |
| Balanced Tree Search | O(log n) | Balanced tree only | AVL/Red-Black trees |
Correctness Constraints (STRICT):
- β Balanced tree search NEVER recommended for unbalanced trees (loses O(log n) guarantee)
algorithm_recommender/
βββ sorting/
β βββ algorithms.py # Sorting algorithm implementations
β βββ features.py # Feature extraction
β βββ dataset.py # Training data generation
β βββ recommender.py # ML-based recommender
β βββ train.py # Training script
β βββ test.py # Test script
β
βββ searching/
β βββ array_search/
β β βββ algorithms.py # Search algorithm implementations
β β βββ features.py # Feature extraction with correctness
β β βββ dataset.py # Training data (respects constraints)
β β βββ recommender.py # Recommender with constraint enforcement
β β βββ train.py # Training script
β β βββ test.py # Correctness-focused tests
β β
β βββ graph_search/
β β βββ algorithms.py # BFS, Dijkstra, Bellman-Ford, A*
β β βββ features.py # Graph feature extraction
β β βββ dataset.py # Training data generation
β β βββ recommender.py # Graph search recommender
β β βββ train.py # Training script
β β βββ test.py # Path correctness tests
β β
β βββ string_search/ # NEW: String/pattern matching
β β βββ algorithms.py # Naive search, Trie search
β β βββ features.py # Text/pattern feature extraction
β β βββ constraints.py # Trie validity constraints
β β βββ dataset.py # Training data generation
β β βββ recommender.py # String search recommender
β β βββ train.py # Training script
β β βββ test.py # Correctness tests
β β
β βββ tree_search/ # NEW: Tree-based searching
β βββ algorithms.py # BST, AVL tree, search functions
β βββ features.py # Tree feature extraction (balance, height)
β βββ constraints.py # Balance threshold constraints
β βββ dataset.py # Training data generation
β βββ recommender.py # Tree search recommender
β βββ train.py # Training script
β βββ test.py # Balance-aware tests
β
βββ models/ # Saved trained models
β βββ sorting_model.pkl
β βββ array_search_model.pkl
β βββ graph_search_model.pkl
β βββ string_search_model.pkl # NEW
β βββ tree_search_model.pkl # NEW
β
βββ utils/ # Shared utilities
β βββ timing.py # Benchmarking utilities
β βββ statistics.py # Statistical helpers
β βββ validation.py # Input validation
β βββ structure_inference.py # NEW: Auto-detect input type
β
βββ web/ # Web interface
β βββ app.py # Flask application
β βββ templates/ # HTML templates
β βββ static/ # CSS/JS assets
β
βββ unified_interface.py # Single entry point for all recommendations
βββ __init__.py # Package initialization
βββ README.md # This file
# Clone or download the project
cd algorithm_recommender
# Install dependencies
pip install numpy scikit-learnfrom algorithm_recommender import AlgorithmRecommender
# Initialize
recommender = AlgorithmRecommender()
# Train models (first time) or load existing models
recommender.train_all() # Generates training data and trains models
# OR
recommender.load_models() # Load pre-trained models
# ===== SORTING =====
result = recommender.recommend_sorting([5, 2, 8, 1, 9, 3, 7])
print(f"Algorithm: {result['algorithm']}")
print(f"Why: {result['explanation']}")
# Output: Algorithm: insertion_sort
# Output: Why: Array is small array, highly unsorted β Insertion Sort selected.
# Actually sort the array
sorted_arr, algo, why = recommender.sort([5, 2, 8, 1, 9])
print(sorted_arr) # [1, 2, 5, 8, 9]
# ===== ARRAY SEARCH =====
result = recommender.recommend_array_search([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
print(f"Algorithm: {result['algorithm']}")
print(f"Valid options: {result['valid_algorithms']}")
# Output: Algorithm: binary_search
# Output: Valid options: ['linear_search', 'binary_search', 'jump_search', 'exponential_search']
# Search for a value
idx, algo, why = recommender.search_array([1, 2, 3, 4, 5], 3)
print(f"Found at index {idx} using {algo}")
# ===== GRAPH SEARCH =====
graph = {
0: [(1, 4.0), (2, 1.0)],
1: [(3, 1.0)],
2: [(1, 2.0), (3, 5.0)],
3: []
}
result = recommender.recommend_graph_search(graph)
print(f"Algorithm: {result['algorithm']}")
# Output: Algorithm: dijkstra
# Find shortest path
path, cost, algo, why = recommender.search_graph(graph, start=0, goal=3)
print(f"Path: {path}, Cost: {cost}")
# Output: Path: [0, 2, 1, 3], Cost: 4.0
# ===== STRING SEARCH (NEW) =====
# Single pattern - only naive search valid
result = recommender.recommend_string_search("hello world", ["world"])
print(f"Algorithm: {result['algorithm']}")
# Output: Algorithm: naive_string_search
# Multiple patterns - trie search valid and optimal
result = recommender.recommend_string_search("hello world hello", ["hello", "world", "foo"])
print(f"Algorithm: {result['algorithm']}")
print(f"Valid options: {result['valid_algorithms']}")
# Output: Algorithm: trie_search
# Output: Valid options: ['naive_string_search', 'trie_search']
# ===== TREE SEARCH (NEW) =====
from searching.tree_search.algorithms import BST, AVLTree
# Balanced AVL tree - balanced search valid
avl = AVLTree()
for k in [5, 3, 7, 1, 9, 4, 6]:
avl.insert(k)
result = recommender.recommend_tree_search(avl)
print(f"Algorithm: {result['algorithm']}")
# Output: Algorithm: balanced_tree_search
# Skewed BST - only bst_search valid
bst = BST()
for k in range(1, 20): # Sorted insertion creates skew
bst.insert(k)
result = recommender.recommend_tree_search(bst)
print(f"Algorithm: {result['algorithm']}")
# Output: Algorithm: bst_search
# ===== AUTO-DETECTION (NEW) =====
# The system can automatically detect what kind of problem you have
result = recommender.recommend_auto([5, 2, 8, 1, 9], operation_hint="sort")
print(f"Task: {result['task_type']}, Algorithm: {result['algorithm']}")
# Output: Task: sorting, Algorithm: insertion_sortcd algorithm_recommender
python train_all.py # Trains all 5 models
# OR
python unified_interface.py # Runs demo which trains if needed# Sorting
python -m sorting.train --num_instances 1000
# Array Search
python -m searching.array_search.train --num_instances 500
# Graph Search
python -m searching.graph_search.train --num_instances 400
# String Search (NEW)
python -m searching.string_search.train --num_instances 300
# Tree Search (NEW)
python -m searching.tree_search.train --num_instances 300# Sorting tests
python -m sorting.test
# Array search tests (correctness-focused)
python -m searching.array_search.test
# Graph search tests
python -m searching.graph_search.test
# String search tests (NEW)
python -m searching.string_search.test
# Tree search tests (NEW)
python -m searching.tree_search.testThe system includes a professional web interface built with Flask.
cd algorithm_recommender
# Option 1: Run directly
python web/app.py
# Option 2: Use the batch file (Windows)
web\run_web.batThen open http://localhost:5000 in your browser.
- Task Detection: Automatically detects sorting, array search, graph search, string search, or tree search from natural language
- Dynamic Input Forms: Different inputs for each task type
- Feature Visualization: Shows extracted features in real-time
- Algorithm Recommendation: Displays the recommended algorithm with explanation
- Confidence Scores: Visual bar chart of algorithm probabilities
- Algorithm Execution: Execute the recommended algorithm and see results
- Failure Mode Analysis: Educational feature showing constraint enforcement (see below)
The web app includes an optional, read-only "Failure Mode Analysis" feature that demonstrates why correctness constraints are essential:
What It Shows:
- Algorithm WITHOUT Constraints: What the raw ML model would predict
- Why It's Invalid: Explanation of the correctness constraint violated
- Corrected Algorithm: The valid algorithm after constraint enforcement
How to Use:
- Get a recommendation for any task
- In the results section, click "Analyze Constraints" button
- View the side-by-side comparison of unconstrained vs constrained choices
Example (Array Search on unsorted array):
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Failure Mode Analysis β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Input: Array [5, 2, 8, 1, 9] (UNSORTED) β
β β
β βββββββββββββββββββββββ βββββββββββββββββββββββ β
β β WITHOUT Constraints β β WITH Constraints β β
β β Binary Search β β β Linear Search β
β β
β β [INVALID] β β [GUARANTEED CORRECT]β β
β βββββββββββββββββββββββ βββββββββββββββββββββββ β
β β
β β οΈ Why Invalid: Binary Search requires sorted array. β
β The input array is NOT sorted, so Binary Search would β
β produce incorrect results. β
β β
β β
Constraint System: Overrode ML choice to Linear Search β
β which works correctly on any array. β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
This feature helps users understand:
- Why pure ML is insufficient for algorithm selection
- How correctness constraints prevent invalid recommendations
- The value of the hybrid ML + theory approach
| Endpoint | Method | Description |
|---|---|---|
/ |
GET | Main web interface |
/api/detect-task |
POST | Detect task from problem statement |
/api/recommend/sorting |
POST | Get sorting recommendation |
/api/recommend/array-search |
POST | Get array search recommendation |
/api/recommend/graph-search |
POST | Get graph search recommendation |
/api/recommend/string-search |
POST | Get string search recommendation |
/api/recommend/tree-search |
POST | Get tree search recommendation |
/api/execute/sorting |
POST | Execute sorting algorithm |
/api/execute/array-search |
POST | Execute search algorithm |
/api/execute/graph-search |
POST | Execute graph search algorithm |
/api/failure-mode |
POST | Failure mode analysis (unconstrained vs constrained) |
/api/algorithms |
GET | List all supported algorithms |
A Streamlit version is also available:
streamlit run app.pyEach domain has a dedicated feature extractor that computes relevant characteristics:
Sorting Features:
size_log: Normalized log of array sizesortedness: 1 - (inversions / max_inversions)duplicate_ratio: Fraction of duplicate elementsis_nearly_sorted: Binary indicator for high sortednessis_reverse_sorted: Binary indicator for descending order
Array Search Features:
size_log: Normalized input sizeis_sorted: Critical for algorithm validityis_uniform: Distribution uniformity measuregap_consistency: Measure of value spacing regularity
Graph Search Features:
num_nodes_log,num_edges_log: Size metricsedge_density: E / (V * (V-1))is_weighted: Whether edges have varying weightshas_negative_weights: Critical constraint featurehas_heuristic: Enables A* selection
Training labels are determined using a hybrid empirical + theoretical approach:
- Empirical: Benchmark all valid algorithms on generated instances
- Theoretical: Apply domain-knowledge priors to adjust scores
- Hybrid Score:
(1-Ξ±) * empirical + Ξ± * theoretical
This prevents dominance bias where one algorithm (e.g., QuickSort) always wins empirically but isn't theoretically optimal for all cases.
- Algorithm: RandomForestClassifier
- Why RandomForest:
- Handles non-linear relationships
- Provides feature importance
- Robust to overfitting
- Works well with small-medium datasets
- Hyperparameters: 100 trees, max_depth=12-15, balanced class weights
During training, we evaluate:
- Cross-validation accuracy (5-fold)
- Test accuracy on held-out data
- Classification report per algorithm
- Feature importance for interpretability
During testing, we verify:
- Correctness constraints are never violated
- Algorithm diversity - different algorithms for different regimes
- Path/sort correctness - results are actually correct
This project follows research-level expectations:
- Clear separation between rule-based constraints and ML-based optimization
- Extensive documentation with docstrings and comments
- Reproducibility through random seed control
- Theoretical grounding with complexity analysis
- Explainability - every recommendation includes justification
- Extensibility - easy to add new algorithms
- Implement in
sorting/algorithms.py:
def tim_sort(arr: List[float]) -> List[float]:
"""Tim Sort implementation."""
# ... implementation
return result
SORTING_ALGORITHMS['tim_sort'] = tim_sort- Add theoretical priors in
sorting/dataset.py:
PRIORS['tim_sort'] = {
'is_nearly_sorted': 0.4,
'size_large': 0.3,
}- Retrain the model.
Similar process, but ensure you also define correctness constraints in ALGORITHM_CONSTRAINTS.
- Python 3.8+
- NumPy
- scikit-learn
MIT License
Algorithm Recommender Research Team
This project draws inspiration from:
- Algorithm Selection literature (Rice, 1976)
- AutoML and meta-learning research
- Classic algorithms textbooks (CLRS, Sedgewick)