This project demonstrates an end-to-end SQL query optimization system with ML-based prediction, dynamic indexing, real-time monitoring via InfluxDB + Grafana, and benchmarking using JMeter. The backend is powered by a Flask API that applies query rewrites, logs performance metrics, and adapts based on historical patterns.
-
Clone the repository
-
Install Python dependencies
Ensure you are using Python 3.8+. Install requirements: pip install mysql-connector-python flask scikit-learn pandas -
Start MySQL and InfluxDB
Make sure the following services are running locally:
MySQL (with user: your_username & password, database: ecommerce_db)
InfluxDB (port 8086, database: jmeter) -
Set up MySQL schema and data
Run the provided SQL setup file or manually execute sql_setup.sql to populate tables like products, reviews, orders. -
Start the Flask API
python ml_optimization_script.py
The API will start at: http://localhost:5000/query/ -
Run JMeter Benchmark
Open comparison.jmx in Apache JMeter
Add Backend Listener for InfluxDB
Ensure JSON Extractors and JSR223 Listeners are properly placed
Start test -
View in Grafana
Access Grafana at http://localhost:3000
--> Data Source Configuration
- Go to: Gear Icon → Data Sources
- Click: Add data source
- Select: InfluxDB
- Set:
- URL: http://localhost:8086
- Database: jmeter
- Version: InfluxQL
- Click Save & Test
Import the dashboard JSON provided or manually configure panels to visualize metrics from:
jmeter_metrics (JDBC requests)
optimized_query_exec (Flask-optimized responses)
Python 3.8+
Flask
pandas, numpy, scikit-learn
MySQL Server
InfluxDB (v1.x)
Grafana
Apache JMeter
MySQL Connector for Python
--> Ananya Penuballi
Designed and created the MySQL schema and dataset for products, reviews, and orders.
Implemented core SQL optimization logic, including:
- Index suggestion and creation logic
- Query rewriting strategies (e.g., USE INDEX, SQL_NO_CACHE)
- Handling of subqueries, JOINs, and aggregations
--> Sai Sanjay Yerunkar
Led the Flask API development, responsible for:
Predictive optimization using a trained ML model (RandomForestRegressor)
- Executing optimized queries and logging performance
- Returning clean JSON responses with applied optimizations
Built the JMeter setup for benchmarking each query endpoint
Integrated InfluxDB and Grafana for real-time visualization
Ensured correct query execution time logging via JSR223 scripting and JSON extractors
The optimized_query_exec InfluxDB measurement logs only query execution time, not API overhead.
Indexes are created dynamically if missing, with error-handling for duplicate index cases.
All dashboards are parameterized to allow per-query comparison over time.