Skip to content

feat: automated index recommendations with live schema (#7)#54

Merged
ringo380 merged 2 commits into
mainfrom
feat/index-recommendations
May 8, 2026
Merged

feat: automated index recommendations with live schema (#7)#54
ringo380 merged 2 commits into
mainfrom
feat/index-recommendations

Conversation

@ringo380
Copy link
Copy Markdown
Owner

@ringo380 ringo380 commented May 8, 2026

Closes #7. Absorbs the schema-introspection slice of #6.

Summary

  • Schema-aware index recommendations driven by EXPLAIN cost-deltas (HypoPG on PostgreSQL, stats-based heuristics on MySQL/SQLite), with redundancy detection against existing indexes and engine-specific CREATE/DROP DDL.
  • Persisted per-user database connections with Fernet-encrypted credentials, CRUD UI at /connections/.
  • Wired end-to-end into the grade flow — when a user picks a saved connection, results appear in a new "Index recommendations" panel on the grade-results page.

What's in here

Layer What Where
1 UserDatabaseConnection model, Fernet crypto, CRUD views/templates analyzer/models/connection_models.py, analyzer/services/connection_crypto.py, analyzer/views/connection_views.py, analyzer/templates/analyzer/connections/
2 LiveSchemaContext (Redis-cached schema snapshots) + wires DatabaseStatisticsManager._fetch_live_statistics analyzer/services/live_schema_context.py, analyzer/ml/integration/database_stats.py
3 IndexRecommender service + DB-specific DDL generator analyzer/services/index_recommender.py, analyzer/services/index_script_generator.py
4 QueryAnalysis.index_recommendations JSONField + grade-flow wiring migration 0005, analyzer/views/query_grading_views.py, analyzer/forms/query_forms.py
5 UI panel (CodeMirror DDL blocks, confidence pill, redundancy badge) analyzer/templates/analyzer/grade_results.html, grade_form.html, base.html
6 Companion extract_index_features for future supervised training (does not change the 45-feature main vector) analyzer/ml/core/feature_extractor.py
7 GA4 index_recommendation_generated server-side event via session-flag pattern analyzer/views/query_grading_views.py
8 43 new tests (12 connections + 7 live-schema + 20 recommender + 4 integration) analyzer/test_connections.py, test_live_schema_context.py, test_index_recommender.py, test_grade_with_live_schema.py

Cost-benefit approach (why not HybridQueryGrader?)

HybridQueryGrader scores query text patterns — it has no awareness of indexes or schema, so feeding hypothetical indexes through it would produce identical scores. Real cost-benefit needs EXPLAIN-grounded data:

  • PostgreSQL: detects HypoPG via pg_extension, creates each candidate as a hypothetical index, diffs Total Cost from EXPLAIN (FORMAT JSON). HIGH confidence. Falls back to heuristic + emits an advisory if HypoPG isn't installed.
  • MySQL / SQLite: row-count × selectivity heuristic (1/√N for equality on tables >10k rows, 0.3 for range, 0.5 for ORDER BY/JOIN). MEDIUM confidence with row counts, LOW without.

Recommendations are ranked by improvement × confidence_weight, capped at 5.

Required before merge

  • Set DB_CONNECTION_KEY on Railway (railway variables --service querygrade --set "DB_CONNECTION_KEY=$(python -c 'from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())')" --skip-deploys)
  • cryptography>=42.0 is added to both requirements.txt and requirements-prod.txt

Test plan

  • python manage.py test analyzer.test_connections analyzer.test_live_schema_context analyzer.test_index_recommender analyzer.test_grade_with_live_schema — all 43 pass
  • python manage.py test analyzer — 637 tests, 18 fail / 4 err (same as main baseline; no regressions)
  • Manual end-to-end on a local PostgreSQL with HypoPG: seed a 1M-row orders table, save connection, grade SELECT * FROM orders WHERE customer_id = 42, verify ~95-99% predicted improvement card with valid DDL
  • Manual: re-run after creating the suggested index → verify it now shows the Already exists redundancy badge
  • MySQL spot-check via Docker MySQL 8 → expect MEDIUM confidence
  • GA4 DebugView shows index_recommendation_generated with the four params after a successful recommendation

Out of scope (follow-up issues)

  • Training a new ML model for index ranking — Layer 6 only captures features
  • Multi-statement workload analysis (DTA-style)
  • Auto-applying CREATE INDEX (we generate scripts; user runs them)
  • HypoPG installation automation (we detect + advise)

ringo380 added 2 commits May 7, 2026 19:29
…der (#7)

Foundation for automated index recommendations (issue #7), Layers 1-3:

- Layer 1: UserDatabaseConnection model with Fernet-encrypted credentials
  (cryptography>=42.0, DB_CONNECTION_KEY env var). CRUD views at /connections/.
- Layer 2: LiveSchemaContext snapshots tables/columns/indexes/FKs from a
  user's PG/MySQL/SQLite via DatabaseIntrospector and caches them in the
  query_analysis_cache (2h TTL). Wires DatabaseStatisticsManager.
  _fetch_live_statistics. AnalysisContext gains an optional live_schema.
- Layer 3: IndexRecommender service — extracts candidates from
  WHERE/JOIN/ORDER BY/GROUP BY, classifies redundancy (EXACT/SUBSUMED)
  against existing indexes, scores cost-benefit via HypoPG EXPLAIN-deltas
  on PostgreSQL (with graceful heuristic fallback) or row-count×selectivity
  heuristics on MySQL/SQLite, and emits engine-specific CREATE/DROP DDL.
  Caps at 5 ranked recommendations, surfaces advisories.

39 new tests (test_connections, test_live_schema_context,
test_index_recommender). Full suite: 633 tests, 18 fail / 4 err — same
baseline as main (no regressions). Layers 4-9 (grade-flow integration,
UI panel, ML features, GA4 event, integration tests, docs) follow in
subsequent commits.
Completes issue #7 Layers 4-9 on top of Layer 1-3 foundation:

- L4: QueryAnalysis.index_recommendations JSONField (migration 0005);
  QueryGradeForm gains optional user-scoped db_connection picker;
  grade_query view runs IndexRecommender after analyze_query when a
  connection is supplied, persists results, touches last_used_at.
  Failures are logged but never break the grade.
- L5: "Index recommendations" panel in grade_results.html — confidence
  pill, redundancy badge, predicted-improvement %, expandable CodeMirror
  DDL block per recommendation, "Copy all DDL" action. Connection picker
  in grade_form.html. Navbar dropdown link to /connections/.
- L6: FeatureExtractor.extract_index_features companion method
  (existing_index_count, unindexed_join/where_columns, largest_table_rows,
  recommended_index_count). Kept *separate* from the 45-feature main
  vector so deployed HYBRID_SCORER models keep loading. Persisted under
  index_recommendations.index_features for future supervised training.
- L7: Server-side GA4 event index_recommendation_generated via session-
  flag pattern with recommendation_count / database_engine /
  confidence_high_count / redundant_filtered_count params.
- L8: 4 TransactionTestCase integration tests covering the happy path,
  no-connection fallback, user-scoped picker, and introspector-failure
  resilience. Mocks _try_hypopg_improvement to avoid alias leakage that
  triggers ATOMIC_REQUESTS KeyError.

Full suite: 637 tests, 18 fail / 4 err — same baseline as main
(no regressions). +43 new passing tests (12 + 7 + 20 + 4).
@ringo380 ringo380 merged commit 85cd4ff into main May 8, 2026
3 of 4 checks passed
@ringo380 ringo380 deleted the feat/index-recommendations branch May 8, 2026 02:59
ringo380 added a commit that referenced this pull request May 17, 2026
After PRs #65#68 merged, the pre-existing-failure floor was 15
(11 failures + 4 errors / 637 tests). All 15 were either UX-pass
template-string drift (sentence vs. title case, retitled headings),
behavior drift (anon trial removed login gate), or missing fixture
paths. None were real bugs.

Categories:

- test_anonymous_trial.test_anon_grade_page_shows_trial_banner (1)
  Asserted "Trial mode" — no template renders that string anywhere.
  Switched to "free grades left", which the banner does render.

- test_feedback (5)
  Title-case → sentence-case across submit form heading, update
  heading, and analytics page heading. test_feedback_button_in_results
  asserted "Provide Feedback" but the actual button on grade_results
  is labeled "Detailed feedback" (links to the same submit_feedback URL).

- test_integration (3, legacy)
  - test_authentication_required: /grade/ is no longer login-gated
    (anon trial flow); only history/account/connections require auth.
  - test_full_query_grading_workflow: "Query Analysis Results"
    retitled to "Grade results" in the UX pass.
  - test_grade_display_formatting: grade-{letter} CSS class was
    retired; grade pill now uses Tailwind utilities. Assert visible
    grade letter directly.

- test_database_analysis.test_database_analyze_get (1)
  Page heading retitled "Database Architecture Analysis" → "Connect
  a database" (#54 connection-mgmt UI).

- test_optimization.test_optimization_integration_workflow (1)
  Optimization section + tab labels lowercased and shortened.

- analyzer.tests.ParserTestCase (4 errors)
  setUp() looked for sample logs under analyzer/samples/ but they
  live at the repo-root samples/ dir. Fixed the path computation.

After this change: `python manage.py test analyzer` → 637 tests,
0 failures, 0 errors, 14 skipped.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Automated Index Recommendations

1 participant