feat: automated index recommendations with live schema (#7) by ringo380 · Pull Request #54 · ringo380/QueryGrade

ringo380 · 2026-05-08T02:53:26Z

Closes #7. Absorbs the schema-introspection slice of #6.

Summary

Schema-aware index recommendations driven by EXPLAIN cost-deltas (HypoPG on PostgreSQL, stats-based heuristics on MySQL/SQLite), with redundancy detection against existing indexes and engine-specific CREATE/DROP DDL.
Persisted per-user database connections with Fernet-encrypted credentials, CRUD UI at /connections/.
Wired end-to-end into the grade flow — when a user picks a saved connection, results appear in a new "Index recommendations" panel on the grade-results page.

What's in here

Layer	What	Where
1	`UserDatabaseConnection` model, Fernet crypto, CRUD views/templates	`analyzer/models/connection_models.py`, `analyzer/services/connection_crypto.py`, `analyzer/views/connection_views.py`, `analyzer/templates/analyzer/connections/`
2	`LiveSchemaContext` (Redis-cached schema snapshots) + wires `DatabaseStatisticsManager._fetch_live_statistics`	`analyzer/services/live_schema_context.py`, `analyzer/ml/integration/database_stats.py`
3	`IndexRecommender` service + DB-specific DDL generator	`analyzer/services/index_recommender.py`, `analyzer/services/index_script_generator.py`
4	`QueryAnalysis.index_recommendations` JSONField + grade-flow wiring	migration 0005, `analyzer/views/query_grading_views.py`, `analyzer/forms/query_forms.py`
5	UI panel (CodeMirror DDL blocks, confidence pill, redundancy badge)	`analyzer/templates/analyzer/grade_results.html`, `grade_form.html`, `base.html`
6	Companion `extract_index_features` for future supervised training (does not change the 45-feature main vector)	`analyzer/ml/core/feature_extractor.py`
7	GA4 `index_recommendation_generated` server-side event via session-flag pattern	`analyzer/views/query_grading_views.py`
8	43 new tests (12 connections + 7 live-schema + 20 recommender + 4 integration)	`analyzer/test_connections.py`, `test_live_schema_context.py`, `test_index_recommender.py`, `test_grade_with_live_schema.py`

Cost-benefit approach (why not HybridQueryGrader?)

HybridQueryGrader scores query text patterns — it has no awareness of indexes or schema, so feeding hypothetical indexes through it would produce identical scores. Real cost-benefit needs EXPLAIN-grounded data:

PostgreSQL: detects HypoPG via pg_extension, creates each candidate as a hypothetical index, diffs Total Cost from EXPLAIN (FORMAT JSON). HIGH confidence. Falls back to heuristic + emits an advisory if HypoPG isn't installed.
MySQL / SQLite: row-count × selectivity heuristic (1/√N for equality on tables >10k rows, 0.3 for range, 0.5 for ORDER BY/JOIN). MEDIUM confidence with row counts, LOW without.

Recommendations are ranked by improvement × confidence_weight, capped at 5.

Required before merge

Set DB_CONNECTION_KEY on Railway (railway variables --service querygrade --set "DB_CONNECTION_KEY=$(python -c 'from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())')" --skip-deploys)
cryptography>=42.0 is added to both requirements.txt and requirements-prod.txt

Test plan

python manage.py test analyzer.test_connections analyzer.test_live_schema_context analyzer.test_index_recommender analyzer.test_grade_with_live_schema — all 43 pass
python manage.py test analyzer — 637 tests, 18 fail / 4 err (same as main baseline; no regressions)
Manual end-to-end on a local PostgreSQL with HypoPG: seed a 1M-row orders table, save connection, grade SELECT * FROM orders WHERE customer_id = 42, verify ~95-99% predicted improvement card with valid DDL
Manual: re-run after creating the suggested index → verify it now shows the Already exists redundancy badge
MySQL spot-check via Docker MySQL 8 → expect MEDIUM confidence
GA4 DebugView shows index_recommendation_generated with the four params after a successful recommendation

Out of scope (follow-up issues)

Training a new ML model for index ranking — Layer 6 only captures features
Multi-statement workload analysis (DTA-style)
Auto-applying CREATE INDEX (we generate scripts; user runs them)
HypoPG installation automation (we detect + advise)

…der (#7) Foundation for automated index recommendations (issue #7), Layers 1-3: - Layer 1: UserDatabaseConnection model with Fernet-encrypted credentials (cryptography>=42.0, DB_CONNECTION_KEY env var). CRUD views at /connections/. - Layer 2: LiveSchemaContext snapshots tables/columns/indexes/FKs from a user's PG/MySQL/SQLite via DatabaseIntrospector and caches them in the query_analysis_cache (2h TTL). Wires DatabaseStatisticsManager. _fetch_live_statistics. AnalysisContext gains an optional live_schema. - Layer 3: IndexRecommender service — extracts candidates from WHERE/JOIN/ORDER BY/GROUP BY, classifies redundancy (EXACT/SUBSUMED) against existing indexes, scores cost-benefit via HypoPG EXPLAIN-deltas on PostgreSQL (with graceful heuristic fallback) or row-count×selectivity heuristics on MySQL/SQLite, and emits engine-specific CREATE/DROP DDL. Caps at 5 ranked recommendations, surfaces advisories. 39 new tests (test_connections, test_live_schema_context, test_index_recommender). Full suite: 633 tests, 18 fail / 4 err — same baseline as main (no regressions). Layers 4-9 (grade-flow integration, UI panel, ML features, GA4 event, integration tests, docs) follow in subsequent commits.

Completes issue #7 Layers 4-9 on top of Layer 1-3 foundation: - L4: QueryAnalysis.index_recommendations JSONField (migration 0005); QueryGradeForm gains optional user-scoped db_connection picker; grade_query view runs IndexRecommender after analyze_query when a connection is supplied, persists results, touches last_used_at. Failures are logged but never break the grade. - L5: "Index recommendations" panel in grade_results.html — confidence pill, redundancy badge, predicted-improvement %, expandable CodeMirror DDL block per recommendation, "Copy all DDL" action. Connection picker in grade_form.html. Navbar dropdown link to /connections/. - L6: FeatureExtractor.extract_index_features companion method (existing_index_count, unindexed_join/where_columns, largest_table_rows, recommended_index_count). Kept *separate* from the 45-feature main vector so deployed HYBRID_SCORER models keep loading. Persisted under index_recommendations.index_features for future supervised training. - L7: Server-side GA4 event index_recommendation_generated via session- flag pattern with recommendation_count / database_engine / confidence_high_count / redundant_filtered_count params. - L8: 4 TransactionTestCase integration tests covering the happy path, no-connection fallback, user-scoped picker, and introspector-failure resilience. Mocks _try_hypopg_improvement to avoid alias leakage that triggers ATOMIC_REQUESTS KeyError. Full suite: 637 tests, 18 fail / 4 err — same baseline as main (no regressions). +43 new passing tests (12 + 7 + 20 + 4).

After PRs #65–#68 merged, the pre-existing-failure floor was 15 (11 failures + 4 errors / 637 tests). All 15 were either UX-pass template-string drift (sentence vs. title case, retitled headings), behavior drift (anon trial removed login gate), or missing fixture paths. None were real bugs. Categories: - test_anonymous_trial.test_anon_grade_page_shows_trial_banner (1) Asserted "Trial mode" — no template renders that string anywhere. Switched to "free grades left", which the banner does render. - test_feedback (5) Title-case → sentence-case across submit form heading, update heading, and analytics page heading. test_feedback_button_in_results asserted "Provide Feedback" but the actual button on grade_results is labeled "Detailed feedback" (links to the same submit_feedback URL). - test_integration (3, legacy) - test_authentication_required: /grade/ is no longer login-gated (anon trial flow); only history/account/connections require auth. - test_full_query_grading_workflow: "Query Analysis Results" retitled to "Grade results" in the UX pass. - test_grade_display_formatting: grade-{letter} CSS class was retired; grade pill now uses Tailwind utilities. Assert visible grade letter directly. - test_database_analysis.test_database_analyze_get (1) Page heading retitled "Database Architecture Analysis" → "Connect a database" (#54 connection-mgmt UI). - test_optimization.test_optimization_integration_workflow (1) Optimization section + tab labels lowercased and shortened. - analyzer.tests.ParserTestCase (4 errors) setUp() looked for sample logs under analyzer/samples/ but they live at the repo-root samples/ dir. Fixed the path computation. After this change: `python manage.py test analyzer` → 637 tests, 0 failures, 0 errors, 14 skipped.

ringo380 added 2 commits May 7, 2026 19:29

ringo380 merged commit 85cd4ff into main May 8, 2026
3 of 4 checks passed

ringo380 deleted the feat/index-recommendations branch May 8, 2026 02:59

ringo380 mentioned this pull request May 16, 2026

Live Database Schema Analysis #6

Open

ringo380 mentioned this pull request May 21, 2026

feat(schema): schema insights advisor (issue #6 scope A) #93

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: automated index recommendations with live schema (#7)#54

feat: automated index recommendations with live schema (#7)#54
ringo380 merged 2 commits into
mainfrom
feat/index-recommendations

ringo380 commented May 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ringo380 commented May 8, 2026

Summary

What's in here

Cost-benefit approach (why not HybridQueryGrader?)

Required before merge

Test plan

Out of scope (follow-up issues)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant