This project is a Python-based application that acts as an AI Data Analyst. It takes a user's question in natural language, converts it into a valid SQL query, and executes it against a database to fetch the answer. This is achieved using a Large Language Model (LLM) through Ollama and the LangChain framework.
- Database Setup: Includes a script to create a sample SQLite database (
amazon.db) with tables for Customers, Products, Orders, and Order Items, and populates them with dummy data. - Web Interface: A user-friendly web interface built with Streamlit to interact with the AI Data Analyst.
- Schema Extraction: Automatically inspects the database to extract its schema (tables and columns).
- Text-to-SQL Conversion: Utilizes the
deepseek-r1:8bmodel via Ollama and LangChain to convert a natural language prompt into an SQL query based on the extracted schema. - Query Execution: Connects to the database, runs the generated SQL query, and retrieves the results.
AI_Data_Analyst/
├── create_database.py # Script to create and populate the database
├── main.py # Main application for Text-to-SQL logic
├── frontend.py # Streamlit web interface for the application
├── amazon.db # SQLite database file (generated by create_database.py)
└── README.md # This file
Before you begin, ensure you have the following installed:
- Python 3.8+
- Ollama: You need to have Ollama installed and running on your machine. You can download it from ollama.com.
Follow these steps to set up and run the project.
If your project is in a Git repository, clone it. Otherwise, navigate to your project directory.
It's recommended to use a virtual environment.
# Create and activate a virtual environment (optional but recommended)
python -m venv venv
source venv/bin/activate # On Windows, use `venv\Scripts\activate`
# Install dependencies
pip install sqlalchemy langchain-ollama langchain-coreMake sure the Ollama application is running. Then, pull the deepseek-r1:8b model by running the following command in your terminal:
ollama pull deepseek-r1:8bRun the create_database.py script to generate the amazon.db file and populate it with sample data.
python d:/All_Projects/AI_Data_Analyst/create_database.pyYou should see the output: Database and tables created with dummy data.
Execute the main.py script to perform the Text-to-SQL conversion and query the database.
python d:/All_Projects/AI_Data_Analyst/main.pyThe script will use the default prompt "Tell me the names of all the customers" to generate an SQL query, execute it, and print the results.
Example Output:
Result: [('Alice Johnson',), ('Bob Smith',), ('Catherine Lee',), ('David Brown',), ('Emma Wilson',), ('Frank Harris',), ('Grace Taylor',), ('Henry Davis',), ('Isabella Moore',), ('Jack Martin',)]