An LLM-powered agent system that works with structured datasets (CSV, XLSX) to provide:
- Insight Generation → Convert natural language queries into SQL (DuckDB) and return insights.
- Data Augmentation → Enhance datasets by applying annotations or transformations suggested by an LLM.
Built with Python, DuckDB, and OpenRouter API (supports DeepSeek, Llama, etc.).
- Natural language → SQL query generation
- Execute queries on CSV data via DuckDB
- Augment datasets with new fields
- Save outputs as CSV
- Secure API key management via
.env
llama_endpoint/
│── agents.py # InsightGenerationAgent & DataAugmentationAgent
│── utils.py # Helpers (CSV load/save, SQL runner)
│── main.py # CLI entry point
│── sample_glassdoor.csv # Example dataset
│── requirements.txt # Dependencies
│── README.md # Documentation
Clone this repo and set up your environment:
git clone https://github.com/https://github.com/rc02041998/llm-sql-augmentor
cd llm-sql-augmentor
python3 -m venv venv
source venv/bin/activate # Mac/Linux
venv\\Scripts\\activate # Windows
Install dependencies:
pip install -r requirements.txt
Create a .env
file in the project root:
OPENROUTER_API=your_openrouter_api_key
MODEL=deepseek-chat
Convert natural language to SQL and run it on your CSV:
python main.py --input sample_glassdoor.csv \
--query "Show the count of reviews of each designation" \
--mode insight
Outputs:
- Generated SQL query
- Results printed in terminal
- Saved as
insight_output.csv
Add new annotations/fields to the dataset:
python main.py --input sample_glassdoor.csv \
--query "Add a column 'sentiment' labeling each review as 'positive' (rating >=4), 'neutral' (rating 3), or 'negative' (rating <=2)." \
--mode augment
Outputs:
- Augmented dataset
- Saved as
augment_output.csv
sample_glassdoor.csv
designation | rating | review_text | date |
---|---|---|---|
Software Eng | 5 | Great place to work | 2018-08-01 |
Analyst | 2 | Poor management and long hours | 2018-08-05 |
Manager | 3 | Average experience | 2018-08-12 |
Intern | 1 | Toxic culture, no learning opportunities | 2018-08-15 |
- Python 3.9+
- Pandas
- DuckDB
- OpenAI Python client
- python-dotenv
Install all with:
pip install -r requirements.txt
Pull requests are welcome! For major changes, please open an issue first to discuss your ideas.
- DuckDB for in-memory SQL execution
- OpenRouter for LLM access
- Pandas for data handling