Note: This project was originally developed at the SRH-Heidelberg-University-ADSA organization as a Proof of Concept for Deutsche Bank AG. This is a personal copy for portfolio purposes. Original repo: SRH-Heidelberg-University-ADSA/SAS2Py
AI-powered tool for translating legacy SAS code into modern Python using fine-tuned Large Language Models and Retrieval-Augmented Generation (RAG).
University: SRH Heidelberg University
Program: Applied Data Science & Analytics
Industry Partner: Deutsche Bank AG
Enterprise organizations running legacy SAS analytics face increasing pressure to modernize. Manual code migration is slow, error-prone, and expensive. SAS2Py automates this translation using a combination of fine-tuned LLMs and a hybrid RAG pipeline that retrieves semantically similar examples from a curated dataset of 164 SAS→Python translation pairs.
The system supports three fine-tuned models that users can switch between at runtime, each optimized for different translation scenarios:
| Model | Parameters | VRAM | Best For |
|---|---|---|---|
| Qwen2.5-Coder 1.5B | 1.5B | ~3 GB | Fast translations, low-resource environments |
| DeepSeek Coder 6.7B | 6.7B | ~13 GB | Balanced accuracy and speed |
| Llama 3.1 8B Instruct | 8B | ~15 GB | Complex multi-step SAS programs |
- Multi-model translation with runtime hot-swapping via REST API
- Hybrid RAG combining vector similarity search (pgvector) with full-text keyword search
- 8-step post-processing pipeline that cleans FIM tokens, SAS code leaks, and model artifacts
- Dependency analysis detecting macros, libraries, remote execution, and unsupported PROCs
- Syntax validation via AST parsing with missing import detection
- Web interface (Streamlit) with file upload, model selection, and user feedback
- Audit logging for compliance tracking and feedback collection
SAS2Py follows a 4-layer architecture with clear separation of concerns:
Layer 1: Presentation → Streamlit UI (port 8501)
Layer 2: Business Logic → FastAPI + Translation Pipeline (port 8000)
Layer 3: Persistence → Connection Pool + Query Layer
Layer 4: Database → PostgreSQL 16 + pgvector
Translation Pipeline:
User SAS Code
→ Dependency Analyzer (detect macros, datasets, warnings)
→ RAG Service (hybrid vector + full-text retrieval)
→ Prompt Builder (Full2ex for small models, Full for large)
→ Model Service (generate Python via LLM)
→ Post-Processor (clean FIM tokens, SAS leaks)
→ Validator (check Python syntax via AST)
→ Audit Log
→ Return result
See docs/ARCHITECTURE.md for the full architecture documentation.
The architecture evolved through multiple design iterations — from a microservices design (Redis, Celery, Nginx, Ollama) to the current streamlined 4-layer approach. See figures/architecture_diagrams/README.md for the full evolution with all 6 diagrams.
This system is designed for GPU servers. It was developed and tested on an NVIDIA H200 NVL server (141 GB VRAM, 256 GB RAM). A GPU server is required for running all three models and for production deployment.
Recommended: GPU Server (for all models and production use)
| Requirement | Details | How to check |
|---|---|---|
| NVIDIA GPU | 16+ GB VRAM (A100, H100, H200, etc.) | nvidia-smi |
| CUDA | 12.1+ drivers | nvidia-smi (shows CUDA version) |
| RAM | 32 GB+ | free -h |
| Disk Space | 50+ GB free (31 GB models + dependencies) | df -h |
| Python | 3.11+ (tested on 3.12) | python --version |
| PostgreSQL | 16+ with pgvector | psql --version |
| Docker | For frontend deployment (can be on any machine, not necessarily the GPU server) | docker --version |
| OS | Linux (Ubuntu 22.04+ recommended) | cat /etc/os-release |
Installing prerequisites on Ubuntu/Debian:
sudo apt install postgresql postgresql-16-pgvector docker.io
Possible but not recommended: High-end laptop/workstation
Running on a laptop is technically possible but comes with significant limitations:
| Model | Min VRAM | Min RAM | Laptop GPU examples | Experience |
|---|---|---|---|---|
| Qwen 1.5B only | 3 GB | 16 GB | RTX 3060, RTX 4060 | Workable, 5-10 sec/translation |
| + DeepSeek 6.7B | 13 GB | 32 GB | RTX 4080, RTX 4090 | Slow model switching |
| + Llama 3.1 8B | 16 GB | 32 GB | RTX 4090 (24 GB) | Very slow, may run out of memory |
Laptops will not be able to run the backend (Steps 1-8) without a dedicated NVIDIA GPU. However, laptops can run the frontend Docker container (Step 9.3-9.5) and connect to a remote backend via the Cloudflare Tunnel.
Per-model requirements:
| Model | Parameters | VRAM | Disk Space | Speed |
|---|---|---|---|---|
| Qwen2.5-Coder 1.5B | 1.5B | ~3 GB | ~2.9 GB | ~2-4 sec |
| DeepSeek Coder 6.7B | 6.7B | ~13 GB | ~13 GB | ~5-10 sec |
| Llama 3.1 8B Instruct | 8B | ~16 GB | ~15 GB | ~8-15 sec |
| All three | — | ~16 GB (one at a time) | ~31 GB | — |
The Qwen 1.5B model is recommended for getting started — smallest footprint, fastest inference, best syntax validity (100%).
git clone https://github.com/SRH-Heidelberg-University-ADSA/SAS2Py.git
cd SAS2Pypython -m venv venv
source venv/bin/activatepip install -r requirements.txtNote (PyTorch with CUDA): If
torchinstalls without GPU support, install it separately first:pip install torch==2.6.0 --index-url https://download.pytorch.org/whl/cu124
cp .env.example .envNow open .env in a text editor:
nano .envUpdate only these values — choose any username and password you want (you will create this user in PostgreSQL during Step 5):
# Change these to YOUR database credentials:
DB_USER=your_postgres_username
DB_PASSWORD=your_postgres_password
# These defaults usually work — only change if your setup is different:
DB_HOST=localhost
DB_PORT=5432
DB_NAME=sas_translatorSave and exit: Ctrl+X, then Y, then Enter.
Example: If you set
DB_USER=sarveshandDB_PASSWORD=sarvesh14, you will runCREATE USER sarvesh WITH PASSWORD 'sarvesh14' SUPERUSER;in Step 5.
All other values in .env work with defaults. No code changes are needed anywhere.
Install PostgreSQL + pgvector (if not already installed):
# Ubuntu/Debian:
sudo apt-get update
sudo apt-get install -y postgresql postgresql-16-pgvector
# Start PostgreSQL:
sudo pg_ctlcluster 16 main start
# Or: sudo service postgresql startVerify PostgreSQL is running:
sudo pg_isreadyShould show:
/var/run/postgresql:5432 - accepting connections
Create a database user and the database:
Replace your_db_user and your_db_password with the same values you set in .env (Step 4):
sudo -u postgres psql -c "CREATE USER your_db_user WITH PASSWORD 'your_db_password' SUPERUSER;"
sudo -u postgres psql -c "CREATE DATABASE sas_translator OWNER your_db_user;"
sudo -u postgres psql -d sas_translator -c "CREATE EXTENSION IF NOT EXISTS vector;"
sudo -u postgres psql -d sas_translator -f database/migrations/001_initial_schema.sqlExample (if your .env has DB_USER=sarvesh14 and DB_PASSWORD=sarvesh14):
sudo -u postgres psql -c "CREATE USER sarvesh14 WITH PASSWORD 'sarvesh14' SUPERUSER;"
sudo -u postgres psql -c "CREATE DATABASE sas_translator OWNER sarvesh14;"
sudo -u postgres psql -d sas_translator -c "CREATE EXTENSION IF NOT EXISTS vector;"
sudo -u postgres psql -d sas_translator -f database/migrations/001_initial_schema.sqlAlternative — Automated setup (if you have password-based PostgreSQL auth):
bash scripts/setup_database.shThis reads credentials from
.envautomatically. If it asks for a password, use the manual method above instead.
Option A — Load pre-built SQL dump (faster, includes embeddings):
sudo -u postgres psql -d sas_translator -f database/code_translations.sqlOption B — Generate fresh embeddings (takes ~30 seconds):
python scripts/load_data.pyBoth options give you the same result: 164 SAS-to-Python examples with vector embeddings ready for RAG retrieval. Verify with:
sudo -u postgres psql -d sas_translator -c "SELECT COUNT(*) FROM code_translations;"Should show
164.
Models are not included in the repository (~31 GB total). You have three options:
Option A — Download fine-tuned models (recommended, same results as live demo):
Download the fine-tuned model weights (~26 GB) from Google Drive:
https://drive.google.com/file/d/1CO4jdgd9uhiNW4075TBehrwV6kwmf4uI/view
After downloading SAS2Py_model_weights.zip, extract the model folders:
unzip SAS2Py_model_weights.zip
cp -r backend/models/qwen_v6_merged models/
cp -r backend/models/deepseek_merged models/
cp -r backend/models/llama_8b_merged models/
rm -rf backend/modelsOption B — Download base models from Hugging Face (requires re-training):
python scripts/download_models.py --model qwen-1.5b # ~3 GB
python scripts/download_models.py --model all # ~31 GB (all 3)These are the original unfine-tuned models. You will need to run the training scripts to get comparable results. See docs/REPRODUCIBILITY.md.
Option C — Start with Qwen only (quickest, ~3 GB):
If you just want to test the system, download only the Qwen model (Option A or B). It is the smallest (2.9 GB) and loads fastest.
After downloading, your models/ directory should look like:
models/
├── qwen_v6_merged/
│ ├── config.json
│ ├── model.safetensors
│ ├── tokenizer.json
│ └── tokenizer_config.json
├── deepseek_merged/ # (optional, 13 GB)
└── llama_8b_merged/ # (optional, 15 GB)
cd SAS2Py
source venv/bin/activate
uvicorn backend.main:app --host 0.0.0.0 --port 8000Wait until you see API ready! in the logs (~10-30 seconds for model loading). Keep this terminal open.
Verify in another terminal:
curl http://localhost:8000/healthThe backend runs on your GPU server, but users need a web interface to interact with it. The frontend (Streamlit) runs inside a Docker container and connects to the backend through a Cloudflare Tunnel.
Why a tunnel? The Docker container is an isolated environment. Inside the container, localhost means the container itself — not your server. So the frontend inside Docker cannot reach the backend at localhost:8000. The Cloudflare Tunnel creates a public URL that forwards to your backend, and the container uses that URL to communicate.
How it works:
User's Browser
|
v
Docker Container (Streamlit frontend on port 8501)
| reads SAS2PY_API environment variable
| sends translation requests to that URL
v
Cloudflare Tunnel (public URL)
| forwards requests to localhost:8000
v
GPU Server (FastAPI backend on port 8000)
| loads AI model, queries database
v
Returns Python translation
What is SAS2PY_API? It is an environment variable that tells the frontend where the backend API is running. In the code (frontend/api_client.py):
API_BASE = os.getenv("SAS2PY_API", "http://localhost:8000")When running inside Docker, you must set this to the tunnel URL. Otherwise the frontend cannot reach the backend.
Step 9.1 — Install Cloudflare Tunnel on the GPU server:
Run these commands on the GPU server (where the backend is running):
sudo wget -q https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64 \
-O /usr/local/bin/cloudflared && sudo chmod +x /usr/local/bin/cloudflaredStep 9.2 — Start the tunnel (on the GPU server):
nohup cloudflared tunnel --url http://localhost:8000 \
--proxy-connect-timeout 600s \
--proxy-keepalive-timeout 600s \
> /tmp/sas2py_tunnel.log 2>&1 &
sleep 10
grep -o 'https://[a-z0-9-]*\.trycloudflare\.com' /tmp/sas2py_tunnel.logSave the URL that appears (e.g., https://random-words.trycloudflare.com). You need it in Step 9.4.
Why 600s timeout? Model switching (loading Llama 8B from disk) can take 2+ minutes. The default timeout would drop the connection before the switch completes.
Step 9.3 — Build the frontend Docker image:
On any machine with Docker installed (your laptop, the GPU server, or a separate server):
docker build -t sas2py-frontend frontend/Step 9.4 — Run the frontend container:
Replace YOUR_TUNNEL_URL with the URL from Step 9.2:
docker run -d \
--name sas2py-frontend \
-p 8501:8501 \
-e SAS2PY_API=YOUR_TUNNEL_URL \
--restart unless-stopped \
sas2py-frontendExample:
docker run -d \
--name sas2py-frontend \
-p 8501:8501 \
-e SAS2PY_API=https://random-words.trycloudflare.com \
--restart unless-stopped \
sas2py-frontendUsing Portainer.io? Instead of
docker run, you can deploy via Portainer:
- Go to Stacks and click Add Stack
- Paste the contents of
docker-compose.yml- Add environment variable:
SAS2PY_API= your tunnel URL- Click Deploy the stack
Portainer is a web-based GUI for managing Docker containers. It is optional — the
docker runcommand above does the same thing.
Step 9.5 — Open the application:
Open your browser and go to:
- If Docker runs on your local machine:
http://localhost:8501 - If Docker runs on a remote server:
http://<server-ip>:8501(replace<server-ip>with the actual IP address of the server running Docker)
You should see the SAS2Py translator interface with:
- SAS code input area
- Model selector in the sidebar (Qwen 1.5B, DeepSeek 6.7B, Llama 3.1 8B)
- "Backend Online" indicator in the sidebar
- Python code output area
Try translating:
proc sort data=mydata; by name; run;Note on model switching: When switching between models in the sidebar, wait for the switch to complete before clicking another model. Loading DeepSeek 6.7B takes ~2 minutes and Llama 3.1 8B takes ~3 minutes on first load (faster on subsequent loads). The Qwen 1.5B model loads in ~10 seconds.
When the tunnel URL changes:
The Cloudflare quick tunnel generates a new URL each time it restarts. When this happens:
- Start a new tunnel (repeat Step 9.2)
- Stop the old container:
docker stop sas2py-frontend && docker rm sas2py-frontend - Start a new container with the new URL (repeat Step 9.4)
The tunnel stays alive as long as the server is running and the process is not killed. It only changes on server restart.
Want a permanent URL? Set up a Cloudflare named tunnel with your own domain. See
docs/DEPLOYMENT.mdfor details.
Local development on a machine with a GPU? If backend, frontend, and browser are all on the same machine, you can skip Steps 9.1-9.5 entirely. Just run
streamlit run frontend/app.pyin a second terminal and openhttp://localhost:8501in your browser. No Docker or tunnel needed — the frontend defaults tohttp://localhost:8000which already points to the backend.
# Check backend health
curl http://localhost:8000/health
# Open Swagger API docs
# Visit: http://localhost:8000/docs# Unit tests (no GPU, no database needed — runs anywhere)
python -m pytest tests/unit/ -v
# Integration tests (needs database connection)
python -m pytest tests/integration/ -v
# End-to-end tests (needs running backend on port 8000)
SAS2PY_TEST_URL=http://localhost:8000 python -m pytest tests/e2e/ -v
# All tests
python -m pytest tests/ -vExpected results: 46 unit + 15 integration + 14 e2e = 75 tests total.
See tests/README.md for details on what each test suite covers.
| What | Where | Example |
|---|---|---|
| Database username | .env → DB_USER |
DB_USER=postgres |
| Database password | .env → DB_PASSWORD |
DB_PASSWORD=mypassword |
| Database port (if non-default) | .env → DB_PORT |
DB_PORT=5432 |
| Nothing else | — | All other config has working defaults |
You do NOT need to change any Python code, any file paths, or any imports. Everything is configured through the
.envfile.
SAS2Py/
├── backend/ # FastAPI backend (Layer 2 + 3)
│ ├── main.py # Application entry point
│ ├── config.py # Centralized configuration
│ ├── api/routes.py # REST API endpoints
│ ├── schemas/ # Pydantic request/response models
│ ├── services/ # Business logic services
│ │ ├── translation_service.py # Pipeline orchestrator
│ │ ├── model_service.py # Multi-model management
│ │ ├── rag_service.py # Hybrid RAG retrieval
│ │ ├── post_processor.py # 8-step output cleaning
│ │ ├── validator.py # Python syntax validation
│ │ └── dependency_analyzer.py # SAS dependency detection
│ └── database/ # Database connection and queries
│
├── frontend/ # Streamlit web interface
│ ├── app.py # Main UI application
│ ├── api_client.py # Backend API connector
│ └── Dockerfile # Frontend container config
│
├── docker-compose.yml # Docker deployment for frontend
│
├── models/ # AI model weights (git-ignored)
│ ├── configs/ # Model configuration files
│ └── model_cards/ # Per-model documentation
│
├── training/ # Training data and configs
│ ├── datasets/ # SAS→Python translation pairs
│ │ ├── training.json # 164 training examples
│ │ ├── test.json # 20 test examples
│ │ └── validation.json # 31 validation examples
│ ├── scripts/ # Model fine-tuning scripts
│ │ └── train_qwen_v6.py # Qwen 1.5B LoRA training
│ └── hyperparameters/ # Training configurations and progression
│
├── benchmarks/ # Benchmark evaluation data
│ ├── sas_programs/ # Test SAS inputs
│ └── python_reference/ # Expected Python outputs
│
├── experiments/ # Experiment tracking
│ ├── experiment_logs/ # Training run logs
│ └── experiment_configs/ # Hyperparameter variations
│
├── results/ # Evaluation results
│ ├── evaluation_metrics/ # CodeBLEU, syntax scores
│ └── comparison_tables/ # Model comparison data
│
├── figures/ # Diagrams and plots
│ ├── architecture_diagrams/ # 6 diagrams showing architecture evolution
│ └── evaluation_plots/
│
├── research/ # Research artefacts
│ ├── related_papers/ # Reference papers (LoRA, RAG, CodeBLEU, etc.)
│ ├── competitor_analysis/ # Competitor tools comparison
│ └── README.md # Literature review and competitor analysis
│
├── notebooks/ # Jupyter notebooks
├── resources/ # Project artefacts
│ ├── presentations/ # Meeting and demo presentations
│ ├── meeting_minutes/ # Meeting notes
│ └── demo_videos/ # Demo video links (hosted on Google Drive)
├── database/migrations/ # SQL schema files
├── scripts/ # Setup and utility scripts
├── tests/ # Test suite (unit, integration, e2e)
└── docs/ # Documentation
├── ARCHITECTURE.md
├── SETUP.md
├── DEPLOYMENT.md
├── API.md
├── MODEL_CARD.md
├── DATASET.md
├── LEARNINGS.md
├── TROUBLESHOOTING.md
└── FAQ.md
| Method | Endpoint | Description |
|---|---|---|
GET |
/health |
Health check and model status |
POST |
/api/translate |
Translate SAS code to Python |
GET |
/api/models |
List available models |
POST |
/api/switch-model |
Switch the active model |
POST |
/api/feedback |
Submit translation feedback |
GET |
/api/stats |
System statistics |
GET |
/docs |
Interactive Swagger UI |
See docs/API.md for complete API documentation.
The training dataset contains 164 curated SAS→Python translation pairs derived from real-world Deutsche Bank analytics workflows. Examples span:
- PROC SQL queries, PROC SORT, PROC EXPORT/IMPORT
- DATA step operations with merges and transformations
- SAS macro definitions and macro variable handling
- Remote execution wrappers (rsubmit/endrsubmit)
All examples are normalized to remove infrastructure wrappers before embedding generation. See docs/DATASET.md for dataset versioning, format details, and category distribution.
A live deployment of SAS2Py is available at:
https://sas2py-translator.app.data-lab.site/
This instance runs on the SRH University compute server (NVIDIA H200 NVL, 141 GB VRAM) with all three models available for hot-swapping.
Note: Availability depends on the university server being online. If the link is down, follow the installation instructions above to run a local instance.
| Component | Technology |
|---|---|
| Backend API | FastAPI, Uvicorn |
| Frontend | Streamlit |
| Database | PostgreSQL 16, pgvector |
| LLMs | Qwen2.5-Coder, DeepSeek Coder, Llama 3.1 |
| Embeddings | Sentence Transformers (all-mpnet-base-v2) |
| ML Framework | PyTorch, Transformers, PEFT |
| Deployment | Docker, Portainer.io, Cloudflare Tunnel |
| Testing | pytest |
| Document | Description |
|---|---|
docs/ARCHITECTURE.md |
System architecture and translation pipeline |
docs/SETUP.md |
Detailed installation guide |
docs/DEPLOYMENT.md |
Docker and Portainer deployment guide |
docs/API.md |
REST API endpoint documentation |
docs/MODEL_CARD.md |
Model details, training, and performance |
docs/DATASET.md |
Dataset versioning and format |
docs/LEARNINGS.md |
Key learnings and future scope |
docs/TROUBLESHOOTING.md |
Problems faced and solutions |
docs/FAQ.md |
Frequently asked questions |
docs/REPRODUCIBILITY.md |
Step-by-step guide to replicate training and results |
docs/LIMITATIONS.md |
Known limitations across dataset, models, RAG, infrastructure |
training/hyperparameters/README.md |
Exact training parameters with progression |
figures/architecture_diagrams/README.md |
Architecture evolution (6 diagrams) |
research/README.md |
Related papers and competitor analysis |
resources/demo_videos/README.md |
Demo video links |
Video demonstrations are hosted on Google Drive. See resources/demo_videos/README.md for links.
See research/README.md for:
- Related papers (LoRA, RAG, CodeBLEU, Sentence-BERT, StarCoder)
- Competitor analysis (SAS CodeConverter, SPROCKET, ChatGPT prompting, saspy)
- How SAS2Py differs from existing approaches
| Name | Role |
|---|---|
| Sarvesh Uday Mokal | Machine Learning Engineer, MLOps & Backend Developer |
| Swathi Chandrashekargiri | Machine Learning Engineer, Research Analyst |
| Kotrashetti Basavraj C. | Frontend Developer, QA Engineer |
| Tanuj Agarwal | Machine Learning Engineer, Data Curator |
| Akshata Hate | Data Analyst & QA |
Supervisors:
- Prof. Dr. Swati Chandna, SRH University Heidelberg, Applied Data Science and Analytics
- Prof. Dr. Ghadeer Mobasher, SRH University Heidelberg, Applied Data Science and Analytics
Industry Partner: Deutsche Bank AG
See CONTRIBUTING.md for detailed contributions per team member.
This project is developed as part of an academic program at SRH University Heidelberg. See LICENSE for details.
See CONTRIBUTING.md for guidelines on how to contribute to this project.