Skip to content

Explainable Natural Language Query Interface for Relational Databases Using a Multi-Agent System. For CITS5553 - Data Science Capstone Project | Semester 2025

License

Notifications You must be signed in to change notification settings

lainos123/explainable-nl-query-db-agents

Repository files navigation

Explainable Natural Language Query Interface for Relational Databases Using a Multi-Agent System

For CITS5553 - Data Science Capstone Project | Semester 2, 2025


Demo Screenshot


I. Project Overview

This project implements an explainable natural language query interface for relational databases using a multi-agent system. It allows users to interact with databases by asking questions in natural language, and the system generates SQL queries to retrieve the relevant data. The key features include:

  • Multi-Agent System: Utilizes multiple AI agents to handle different aspects of the query process, including understanding the question, generating SQL, executing the query, and explaining the results.
  • Explainability: Provides explanations for the generated SQL queries and the results, enhancing user trust and understanding.
  • Database Support: Supports multiple SQLite databases, including the Spider dataset, allowing users to query various database schemas.
  • User-Friendly Interface: A web-based frontend built with Next.js for easy interaction.
  • Backend: A Django REST API backend to manage database interactions and agent coordination.
  • Dockerized Deployment: The entire application can be run using Docker, simplifying setup and deployment.

The architecture of the system is illustrated below:

alt text

II. Setup Guide

This project is designed to be run entirely using Docker. No manual Python or Conda environment setup is required.

Before starting, if you use Windows, ensure that you have run the code below in your terminal to avoid line ending issues (CRLF bug of Windows).

git config --global core.autocrlf input

1. Download and Prepare the Spider Dataset

  • Download the Spider Dataset:

  • Extract and Place the Dataset for default databases feature:

    • This can be skipped if you want to use your own databases only, but the Add All Spider feature will be disabled if you do so.
    • Unzip the file. The folder should be named spider_data.
    • Inside spider_data, there is a subfolder named test_database containing 200+ SQLite databases, which is the whole dataset.
    • Delete all unnecessary files, especially the __MACOSX folder in the root, as this will prevent the zip file from being read by the server, which is Debian.
    • The database folder can be merged into test_database by copy-paste if you want to have the non-duplicate databases, then remove the old, only leave test_database as the tree structure shown below.
    • Please do not let duplicated .sqlite files in a zip file for web import, or in the data folder, as it will add a weird name as ._name.sqlite to the schema and files.
    • Only do that if you really want, but the AI will also confuse you because the name is very similar, especially in Retriver of RAG.
    • Move or copy this folder into the data directory at the root of this project, so you have: data/spider_data

    Your directory should look like:

    data/
    └── spider_data/
        └── test_database/
            ├── academic/
            │   ├── academic.sqlite
            │   └── schema.sql
            ├── flight_1/
            │   ├── flight_1.sqlite
            │   └── schema.sql
            ├── car_1/
            │   ├── car_1.sqlite
            │   └── schema.sql
            └── ... (200+ more databases)
    

    Note: The Spider databases are not included in this repository due to size. Each user must download and place them manually.

2. Install Docker

3. Start the Application

  • Open a terminal and navigate to the web application directory:

    cd web_app
  • Start the application with Docker Compose:

    docker-compose up --build

    The first run may take a few minutes as Docker builds the images.

  • The Docker setup automatically mounts the ../data directory, so your Spider databases (if present) will be accessible to the backend.

4. Access the Application

5. Login Credentials

6. Add Your OpenAI API Key

  • After logging in, click the "API Key Settings" button in the menu.

  • Enter your OpenAI API key (get one from https://platform.openai.com/account/api-keys).

  • Click Save.

    Note: Each user must enter their own OpenAI API key. The .env file API key is only for development/testing, currently not in use.

7. Add the Spider Databases or Your Own Databases

  • Go to "View/Import/Delete Databases" in the menu.
  • Click the purple "Add All Spider" button to upload all Spider databases and generate their schemas.
  • Alternatively, you can upload your own SQLite databases using the "Add" button. The application accepts .sqlite files that up to version 6 (currently .sqlite3). You can zip multiple files and upload them together or upload them one by one. Ensure no duplicate database names, and no _MACOSX folders inside the zip file.
  • After uploading, the databases will appear in the list and further manipulation is possible (view schema, delete, etc.).

8. Test the Agents

  • Go to the chatbot and ask questions about your databases. Example question:

Find the name of all students who were in the tryout sorted in alphabetic order

  • The AI agents will use your API key to generate SQL queries and provide explanations.
  • Play with Agent Parameters

9. Web Servers Development

As this is a Data Science Project Application, the backend is connected with an external data folder so it can be readable and convenient for local usage.

However, it also prevents non-local app deployment. So, if you want to build web servers, by using docker or not, you will need to change the path of the default spider data to inside the media folder of the backend instead of the current data folder, which is outside of the web_app folder as it is now.


Troubleshooting: If you encounter issues, ensure Docker is running and the data/spider_data directory exists (if using the Spider dataset). For further help, consult your team.

About

Explainable Natural Language Query Interface for Relational Databases Using a Multi-Agent System. For CITS5553 - Data Science Capstone Project | Semester 2025

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 5