Skip to content

hmjamil/mql

Repository files navigation

Machine Learning Query Language (MQL)

Implemention of a Declarative Query Language for High Level Machine Learning Application Design

Used Technology

  • Python 3.11 (required)
  • JAVA runtime
  • ReactJS
  • Django rest framework
  • PostgreSQL
  • OS : Linux (to perform autosklearn)

To run the application

firstly clone this repositoroy by

git clone "https://github.com/smart-db-lab/mql.git"

Database setup:

  • install postgresql and login
  • create a postgres database
  • install mysql and login
  • create a mysql database

Setup environment variable

  • Create a file with name .env in the server folder where .env.example file appear
  • paste all variable from .env.example to the created .env file and replace your credentials with the variable name.

To migrate database schema:

    python manage.py migrate

To run Frontend:

    cd client
    npm install
    npm  run dev

To run Backend:

    cd server
    pip install -r requirements.txt
    python manage.py runserver

Also install Java ( openjdk ) runtime environment. [It is required to run h2o automl package]

All done?

  • Now keep running frontend, backend and databases. go to the application. [e.g. https://localhost:5173]
  • create account and login then switch to operation tab to perform query.

Traditional machine learning using python

        import pandas as pd
        from sklearn.model_selection import train_test_split 
        from sklearn.linear_model import LogisticRegression 
        from sklearn.metrics import accuracy_score 
        # 1. Load data (downloaded from Kaggle and saved as 'train.csv') 
        df = pd.read_csv('./train.csv') 
        # 2. Basic preprocessing: select a few features and fill missing ages 
        features = ['Pclass', 'Sex', 'Age', 'Fare']

        df['Sex'] = df['Sex'].map({'male': 0, 'female': 1})
        df['Age'].fillna(df['Age'].median(), inplace=True) 
        X = df[features] 
        y = df['Survived'] 
        # 3. Split into train/test sets 
        X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42 ) 
        # 4. Train a logistic regression model 

        model = LogisticRegression(max_iter=200) 
        model.fit(X_train, y_train) 
        # 5. Evaluate 
        y_pred = model.predict(X_test) 
        acc = accuracy_score(y_test, y_pred)
        print(f'Accuracy: {acc:.2f}')

Output

        Accuracy: 0.80

MQL Query

INSPECT statement

        INSPECT Sex ENCODING METHOD Ordinal FROM train;
        INSPECT Age IMPUTE USING STRATEGY median FROM train;


GENERATE Statement

        GENERATE  DISPLAY OF CLASSIFICATION Survived ALGORITHM LOG LABEL RecordID FEATURES Pclass,Sex,Age,Fare FROM train;

Output of MQL Query:

MQL Report

If the image does not display, download it here.

Examples of MQL query

Generate

Cluster

    GENERATE DISPLAY OF CLUSTERING ALGORITHM KMeans FEATURES CAtomCount,TotalAtomCount,HAtomCount FROM combined ;

MQL Report

If the image does not display, download it here.

    GENERATE DISPLAY OF CLUSTERING ALGORITHM KMeans FEATURES  Pclass,Sex,Age,Fare FROM train;


    GENERATE DISPLAY OF CLUSTERING ALGORITHM KMeans FEATURES  age,rad FROM Boston;
        GENERATE DISPLAY OF CLUSTERING ALGORITHM KMeans FEATURES  PetalLengthCm,PetalWidthCm FROM Iris;

MQL Report

If the image does not display, download it here.

Classification

    GENERATE  CLASSIFICATION Class ALGORITHM KNN  LABEL ProductID FEATURES CAtomCount,TotalAtomCount,HAtomCount FROM combined ;

MQL Report If the image does not display, download it here.

    GENERATE  DISPLAY OF CLASSIFICATION Survived ALGORITHM LOG LABEL RecordID FEATURES Pclass,Sex,Age,Fare FROM train;

MQL Report If the image does not display, download it here.

    GENERATE  CLASSIFICATION Species ALGORITHM KNN WITH ACCURACY 0 LABEL ProductID FEATURES SepalLengthCm,PetalLengthCm,PetalWidthCm FROM Iris ;

MQL Report If the image does not display, download it here.

Prediction

    GENERATE DISPLAY OF PREDICTION Epsilon ALGORITHM LR LABEL serialNo FEATURES CAtomCount,TotalAtomCount,HAtomCount FROM combined ;

MQL Report If the image does not display, download it here.

    GENERATE DISPLAY OF PREDICTION medv ALGORITHM LR WITH ACCURACY 0 LABEL serialNo FEATURES age,rad FROM  Boston WHERE age>50 ;

MQL Report If the image does not display, download it here.

    GENERATE  CLASSIFICATION Species ALGORITHM KNN LABEL ProductID FEATURES SepalLengthCm,PetalLengthCm,PetalWidthCm FROM Iris ;

MQL Report If the image does not display, download it here.

Inspect

CHECKNULL

    INSPECT medv CHECKNULL  FROM Boston;
    INSPECT Survived CHECKNULL  FROM train;
    INSPECT species CHECKNULL  FROM Iris;

ENCODING

    INSPECT Species ENCODING METHOD Ordinal FROM Iris;
    INSPECT Survived ENCODING METHOD One-Hot FROM train;

DEDUPLICATE

    INSPECT * DEDUPLICATE FROM Boston;
    INSPECT medv DEDUPLICATE FROM Boston;

CATEGORIZE

    INSPECT age CATEGORIZE INTO L1,L2,L3,L4 FROM Boston;
    INSPECT age CATEGORIZE INTO L1,L2,L3,L4 FROM train;
    INSPECT PetalLengthCm CATEGORIZE INTO L1,L2,L3,L4 FROM Iris;

Combined INSPECT operation

    INSPECT medv checknull | medv deduplicate | age categorize INTO l1,l2 FROM  Boston;

Show datset

    SHOW Customers;

Drop datset table

    DROP DATASET Customers;

IMPUTE

    INSPECT indus IMPUTE USING STRATEGY mean FROM Boston;
    INSPECT * IMPUTE  FROM Boston;

Composite

        GENERATE DISPLAY OF PREDICTION medv ALGORITHM LR WITH ACCURACY 0 LABEL serialNo FEATURES age,rad FROM  Boston  WHERE INSPECT age CATEGORIZE INTO L1,L2,L3,L4 FROM Boston;
![MQL Report](./all_report/output-9-report.pdf)
If the image does not display, [download it here](./all_report/output-9-report.pdf).



        GENERATE  DISPLAY OF CLASSIFICATION Survived ALGORITHM LOG WITH ACCURACY 0 LABEL RecordID FEATURES Pclass,Sex,Age,Fare FROM train WHERE INSPECT * IMPUTE USING STRATEGY mean FROM train;
![MQL Report](./all_report/output-10-report.pdf)
If the image does not display, [download it here](./all_report/output-10-report.pdf).

SQL query

    SELECT * FROM "Iris";

Available ML Algorithms

Algorithm Type Algorithm query; Code Full Name
Prediction LR Linear Regression
RF Random Forest Regressor
KNN K-Neighbors Regressor
SVR Support Vector Regressor
GBR Gradient Boosting Regressor
Classification LOG Logistic Regression
RFC Random Forest Classifier
KNN K-Neighbors Classifier
SVC Support Vector Classifier
GBC Gradient Boosting Classifier
Clustering KMEANS K-Means Clustering
AGGLOMERATIVE Agglomerative Clustering
DBSCAN DBSCAN Clustering

AutoML Technology

  • PyCaret
  • TPOT (Tree-based Pipeline Optimization Tool)
  • FLAML (Fast Lightweight AutoML)
  • H2O.ai
  • Autosklearn (Linux environment required)

Important Notes and Cautions

⚠️ Feature Naming Guidelines:

  • Features in FEATURES clause must be comma-separated with no spaces between feature names
  • If a feature name contains spaces, enclose it in double quotes (")
  • Example: FEATURES age,income,"annual salary","job title"
  • Incorrect: FEATURES age, income, annual salary (spaces after commas and unquoted space-containing names)
  • Correct: FEATURES age,income,"annual salary"

⚠️ General Syntax Cautions:

  • Database and table names are case-sensitive

  • Algorithm codes must be in uppercase (e.g., LR, KNN, LOG)

  • Column names must match exactly as they appear in your database

  • Ensure your database connection is active before running queries

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •