For CITS5553 - Data Science Capstone Project | Semester 2, 2025
This project implements an explainable natural language query interface for relational databases using a multi-agent system. It allows users to interact with databases by asking questions in natural language, and the system generates SQL queries to retrieve the relevant data. The key features include:
- Multi-Agent System: Utilizes multiple AI agents to handle different aspects of the query process, including understanding the question, generating SQL, executing the query, and explaining the results.
- Explainability: Provides explanations for the generated SQL queries and the results, enhancing user trust and understanding.
- Database Support: Supports multiple SQLite databases, including the Spider dataset, allowing users to query various database schemas.
- User-Friendly Interface: A web-based frontend built with Next.js for easy interaction.
- Backend: A Django REST API backend to manage database interactions and agent coordination.
- Dockerized Deployment: The entire application can be run using Docker, simplifying setup and deployment.
The architecture of the system is illustrated below:
This project is designed to be run entirely using Docker. No manual Python or Conda environment setup is required.
Before starting, if you use Windows, ensure that you have run the code below in your terminal to avoid line ending issues (CRLF bug of Windows).
git config --global core.autocrlf input- 
Download the Spider Dataset: - Visit: https://yale-lily.github.io/spider
 or use the direct link: Google Drive Download
- Download the ZIP file to your computer.
 
- Visit: https://yale-lily.github.io/spider
- 
Extract and Place the Dataset for default databases feature: - This can be skipped if you want to use your own databases only, but the Add All Spiderfeature will be disabled if you do so.
- Unzip the file. The folder should be named spider_data.
- Inside spider_data, there is a subfolder namedtest_databasecontaining 200+ SQLite databases, which is the whole dataset.
- Delete all unnecessary files, especially the __MACOSX folder in the root, as this will prevent the zip file from being read by the server, which is Debian.
- The databasefolder can be merged intotest_databaseby copy-paste if you want to have the non-duplicate databases, then remove the old, only leavetest_databaseas the tree structure shown below.
- Please do not let duplicated .sqlite files in a zip file for web import, or in the data folder, as it will add a weird name as ._name.sqliteto the schema and files.
- Only do that if you really want, but the AI will also confuse you because the name is very similar, especially in Retriver of RAG.
- Move or copy this folder into the datadirectory at the root of this project, so you have:data/spider_data
 Your directory should look like: data/ └── spider_data/ └── test_database/ ├── academic/ │ ├── academic.sqlite │ └── schema.sql ├── flight_1/ │ ├── flight_1.sqlite │ └── schema.sql ├── car_1/ │ ├── car_1.sqlite │ └── schema.sql └── ... (200+ more databases)Note: The Spider databases are not included in this repository due to size. Each user must download and place them manually. 
- This can be skipped if you want to use your own databases only, but the 
- Download Docker Desktop: https://www.docker.com/products/docker-desktop/
- Install for your OS (Windows, macOS, or Linux).
- Start Docker Desktop and wait until it is running.
- 
Open a terminal and navigate to the web application directory: cd web_app
- 
Start the application with Docker Compose: docker-compose up --build The first run may take a few minutes as Docker builds the images. 
- 
The Docker setup automatically mounts the ../datadirectory, so your Spider databases (if present) will be accessible to the backend.
- Frontend: http://localhost:3001
- Backend API: http://localhost:8000
- Django Admin: http://localhost:8000/admin
- 
Django Admin Login (For controlling the backend server): - Visit http://localhost:8000/admin
- Login
- Username: admin
- Password: admin123
 
- Username: 
 
- 
Web Application Login: - Log in at http://localhost:3001
- Use the same credentials (admin/admin123)
 
- 
After logging in, click the "API Key Settings" button in the menu. 
- 
Enter your OpenAI API key (get one from https://platform.openai.com/account/api-keys). 
- 
Click Save. Note: Each user must enter their own OpenAI API key. The .envfile API key is only for development/testing, currently not in use.
- Go to "View/Import/Delete Databases" in the menu.
- Click the purple "Add All Spider" button to upload all Spider databases and generate their schemas.
- Alternatively, you can upload your own SQLite databases using the "Add" button. The application accepts .sqlitefiles that up to version 6 (currently .sqlite3). You can zip multiple files and upload them together or upload them one by one. Ensure no duplicate database names, and no _MACOSX folders inside the zip file.
- After uploading, the databases will appear in the list and further manipulation is possible (view schema, delete, etc.).
- Go to the chatbot and ask questions about your databases. Example question:
Find the name of all students who were in the tryout sorted in alphabetic order
- The AI agents will use your API key to generate SQL queries and provide explanations.
- Play with Agent Parameters
As this is a Data Science Project Application, the backend is connected with an external data folder so it can be readable and convenient for local usage.
However, it also prevents non-local app deployment. So, if you want to build web servers, by using docker or not, you will need to change the path of the default spider data to inside the media folder of the backend instead of the current data folder, which is outside of the web_app folder as it is now.
Troubleshooting:
If you encounter issues, ensure Docker is running and the data/spider_data directory exists (if using the Spider dataset).
For further help, consult your team.

