Implemention of a Declarative Query Language for High Level Machine Learning Application Design
- Python 3.11 (required)
- JAVA runtime
- ReactJS
- Django rest framework
- PostgreSQL
- OS : Linux (to perform autosklearn)
firstly clone this repositoroy by
git clone "https://github.com/smart-db-lab/mql.git"
Database setup:
- install postgresql and login
- create a postgres database
- install mysql and login
- create a mysql database
- Create a file with name .env in the server folder where .env.example file appear
- paste all variable from .env.example to the created .env file and replace your credentials with the variable name.
To migrate database schema:
python manage.py migrate
To run Frontend:
cd client
npm install
npm run dev
To run Backend:
cd server
pip install -r requirements.txt
python manage.py runserver
Also install Java ( openjdk ) runtime environment. [It is required to run h2o automl package]
- Now keep running frontend, backend and databases. go to the application. [e.g. https://localhost:5173]
- create account and login then switch to operation tab to perform query.
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
# 1. Load data (downloaded from Kaggle and saved as 'train.csv')
df = pd.read_csv('./train.csv')
# 2. Basic preprocessing: select a few features and fill missing ages
features = ['Pclass', 'Sex', 'Age', 'Fare']
df['Sex'] = df['Sex'].map({'male': 0, 'female': 1})
df['Age'].fillna(df['Age'].median(), inplace=True)
X = df[features]
y = df['Survived']
# 3. Split into train/test sets
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42 )
# 4. Train a logistic regression model
model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)
# 5. Evaluate
y_pred = model.predict(X_test)
acc = accuracy_score(y_test, y_pred)
print(f'Accuracy: {acc:.2f}')
Output
Accuracy: 0.80
INSPECT Sex ENCODING METHOD Ordinal FROM train;
INSPECT Age IMPUTE USING STRATEGY median FROM train;
GENERATE DISPLAY OF CLASSIFICATION Survived ALGORITHM LOG LABEL RecordID FEATURES Pclass,Sex,Age,Fare FROM train;
If the image does not display, download it here.
GENERATE DISPLAY OF CLUSTERING ALGORITHM KMeans FEATURES CAtomCount,TotalAtomCount,HAtomCount FROM combined ;
If the image does not display, download it here.
GENERATE DISPLAY OF CLUSTERING ALGORITHM KMeans FEATURES Pclass,Sex,Age,Fare FROM train;
GENERATE DISPLAY OF CLUSTERING ALGORITHM KMeans FEATURES age,rad FROM Boston;
GENERATE DISPLAY OF CLUSTERING ALGORITHM KMeans FEATURES PetalLengthCm,PetalWidthCm FROM Iris;
If the image does not display, download it here.
GENERATE CLASSIFICATION Class ALGORITHM KNN LABEL ProductID FEATURES CAtomCount,TotalAtomCount,HAtomCount FROM combined ;
If the image does not display, download it here.
GENERATE DISPLAY OF CLASSIFICATION Survived ALGORITHM LOG LABEL RecordID FEATURES Pclass,Sex,Age,Fare FROM train;
If the image does not display, download it here.
GENERATE CLASSIFICATION Species ALGORITHM KNN WITH ACCURACY 0 LABEL ProductID FEATURES SepalLengthCm,PetalLengthCm,PetalWidthCm FROM Iris ;
If the image does not display, download it here.
GENERATE DISPLAY OF PREDICTION Epsilon ALGORITHM LR LABEL serialNo FEATURES CAtomCount,TotalAtomCount,HAtomCount FROM combined ;
If the image does not display, download it here.
GENERATE DISPLAY OF PREDICTION medv ALGORITHM LR WITH ACCURACY 0 LABEL serialNo FEATURES age,rad FROM Boston WHERE age>50 ;
If the image does not display, download it here.
GENERATE CLASSIFICATION Species ALGORITHM KNN LABEL ProductID FEATURES SepalLengthCm,PetalLengthCm,PetalWidthCm FROM Iris ;
If the image does not display, download it here.
INSPECT medv CHECKNULL FROM Boston;
INSPECT Survived CHECKNULL FROM train;
INSPECT species CHECKNULL FROM Iris;
INSPECT Species ENCODING METHOD Ordinal FROM Iris;
INSPECT Survived ENCODING METHOD One-Hot FROM train;
INSPECT * DEDUPLICATE FROM Boston;
INSPECT medv DEDUPLICATE FROM Boston;
INSPECT age CATEGORIZE INTO L1,L2,L3,L4 FROM Boston;
INSPECT age CATEGORIZE INTO L1,L2,L3,L4 FROM train;
INSPECT PetalLengthCm CATEGORIZE INTO L1,L2,L3,L4 FROM Iris;
INSPECT medv checknull | medv deduplicate | age categorize INTO l1,l2 FROM Boston;
SHOW Customers;
DROP DATASET Customers;
INSPECT indus IMPUTE USING STRATEGY mean FROM Boston;
INSPECT * IMPUTE FROM Boston;
GENERATE DISPLAY OF PREDICTION medv ALGORITHM LR WITH ACCURACY 0 LABEL serialNo FEATURES age,rad FROM Boston WHERE INSPECT age CATEGORIZE INTO L1,L2,L3,L4 FROM Boston;

If the image does not display, [download it here](./all_report/output-9-report.pdf).
GENERATE DISPLAY OF CLASSIFICATION Survived ALGORITHM LOG WITH ACCURACY 0 LABEL RecordID FEATURES Pclass,Sex,Age,Fare FROM train WHERE INSPECT * IMPUTE USING STRATEGY mean FROM train;

If the image does not display, [download it here](./all_report/output-10-report.pdf).
SELECT * FROM "Iris";
Algorithm Type | Algorithm query; Code | Full Name |
---|---|---|
Prediction | LR | Linear Regression |
RF | Random Forest Regressor | |
KNN | K-Neighbors Regressor | |
SVR | Support Vector Regressor | |
GBR | Gradient Boosting Regressor | |
Classification | LOG | Logistic Regression |
RFC | Random Forest Classifier | |
KNN | K-Neighbors Classifier | |
SVC | Support Vector Classifier | |
GBC | Gradient Boosting Classifier | |
Clustering | KMEANS | K-Means Clustering |
AGGLOMERATIVE | Agglomerative Clustering | |
DBSCAN | DBSCAN Clustering |
- PyCaret
- TPOT (Tree-based Pipeline Optimization Tool)
- FLAML (Fast Lightweight AutoML)
- H2O.ai
- Autosklearn (Linux environment required)
- Features in FEATURES clause must be comma-separated with no spaces between feature names
- If a feature name contains spaces, enclose it in double quotes (
"
) - Example:
FEATURES age,income,"annual salary","job title"
- Incorrect:
FEATURES age, income, annual salary
(spaces after commas and unquoted space-containing names) - Correct:
FEATURES age,income,"annual salary"
-
Database and table names are case-sensitive
-
Algorithm codes must be in uppercase (e.g.,
LR
,KNN
,LOG
) -
Column names must match exactly as they appear in your database
-
Ensure your database connection is active before running queries