An intelligent SQL learning platform that provides hands-on practice with instant feedback powered by Claude AI. Master SQL through adaptive tutoring, progressive hints, and unlimited AI-generated exercises.
- Semantic Similarity Analysis: Uses vector embeddings to measure how close your answer is to the solution (0-100%)
- Progressive Hints: Get smarter hints based on your similarity score
- 85%+ similar: "Almost there! Check small details"
- 60-85%: "Right structure, review specific parts"
- 40-60%: "Some elements correct, focus on SQL clauses"
- <40%: "Different approach needed, try these concepts..."
- Multiple Attempts: Up to 3 tries per question with improving hints
- Token Optimization: AI calls only when needed (not for correct answers)
- 36 Curated Exercises: Across beginner, intermediate, and advanced levels
- Exercise Completion Tracking: Never repeat exercises you've already mastered
- AI-Generated Exercises: When you complete all exercises, the system automatically generates new unique challenges
- Progress Monitoring: See your completion rate at each level
- Database Preview: View complete database structure at startup
- Example Exercises: See sample questions before diving in
- Real-time Query Execution: Practice against a real SQLite database
- Result Comparison: Side-by-side view of your results vs. expected results
- Rich Sample Data: E-commerce database with 6 tables and realistic data
- Interactive Cheatsheet: Complete SQL reference with examples
- Concept Highlighting: Each exercise shows which SQL concepts it covers
- Built-in Hints: Context-aware hints for each exercise
sql-tutorial/
├── data/
│ ├── sample_database.sql # E-commerce database schema + seed data
│ ├── sample_database.db # SQLite database (auto-created)
│ └── queries.json # 36 practice exercises
├── src/
│ ├── cheatsheet.md # Comprehensive SQL reference
│ └── tutor.py # Main tutorial engine (AI-powered)
├── tests/
├── .env.example # Template for environment variables
├── .gitignore # Git ignore rules
├── requirements.txt # Python dependencies
└── README.md
- Python 3.7+ (Download from python.org)
- Anthropic API Key (Get from console.anthropic.com)
- SQLite3 (Usually pre-installed)
1. Clone the repository
git clone <repository-url>
cd sql-tutorial2. Install dependencies
pip install -r requirements.txtThis installs:
anthropic- Claude AI SDKsentence-transformers- For semantic similaritynumpy- Vector calculationspython-dotenv- Environment variable management
3. Configure API Key
Copy the example environment file and add your API key:
cp .env.example .envEdit .env and add your Anthropic API key:
ANTHROPIC_API_KEY=your-actual-api-key-here
4. Start Learning!
python src/tutor.pyThe database is already initialized and ready to use!
When you launch the tutor:
-
Database Structure Preview
============================================================ DATABASE STRUCTURE ============================================================ 📋 Table: CUSTOMERS ------------------------------------------------------------ • customer_id: INTEGER (PRIMARY KEY, NOT NULL) • name: VARCHAR(100) (NOT NULL) • email: VARCHAR(100) (NOT NULL) • join_date: DATE (NOT NULL) • country: VARCHAR(50) Total rows: 10 [... more tables ...] -
Example Exercises Preview
============================================================ EXAMPLE EXERCISES ============================================================ 🎯 BEGINNER Level Examples: ------------------------------------------------------------ 1. Select all customers from the USA Concepts: SELECT, WHERE Hint: Use WHERE clause to filter by country [... more examples ...] -
Choose Difficulty: Beginner → Intermediate → Advanced
Question: Find all products with price less than $50
Concepts: SELECT, WHERE, comparison operators
Enter your SQL query (end with semicolon):
> SELECT * FROM products WHERE cost < 50;
❌ Not quite right. The query runs but produces different results.
📊 Similarity to solution: 87.3%
Your results:
(No rows returned)
Expected results:
product_id | product_name | category_id | price | stock_quantity
--------------------------------------------------------------
1 | Wireless Mouse | 1 | 29.99 | 150
2 | USB-C Cable | 1 | 12.99 | 300
[... more rows ...]
💡 Generating hint based on your attempt (Attempt 1/3)...
🔍 Hint: You're very close! You're using the right structure with SELECT
and WHERE, but check your column name carefully. The database uses
'price' not 'cost'.
Would you like to try again? (y/n/s to see solution): y
Attempt 2 (end with semicolon):
> SELECT * FROM products WHERE price < 50;
✅ Correct! Great job!
(Solved in 2 attempts)
Progress: 3/12 exercises completed at beginner level
Your results:
[correct results shown]
- System calculates vector similarity between your query and solution
- Higher similarity (>80%) = More specific hints
- Lower similarity (<40%) = Broader conceptual guidance
- 3 attempts per question
- Progressive hints get more detailed with each attempt
- Can request solution at any time
🎉 Congratulations! You've completed all 12 beginner exercises!
🤖 Generating a new challenge for you...
✅ New exercise generated: Find customers who have ordered products
from at least 3 different categories
Question: Find customers who have ordered products from at least 3
different categories
Concepts: JOIN, GROUP BY, HAVING, DISTINCT
The tutorial uses a realistic e-commerce database with 6 interconnected tables:
| Table | Description | Rows |
|---|---|---|
| customers | Customer information (id, name, email, country) | 10 |
| categories | Product categories (Electronics, Books, etc.) | 6 |
| products | Product catalog (35 products with prices) | 35 |
| orders | Customer orders with status tracking | 20 |
| order_items | Line items in each order | 33 |
| reviews | Product reviews with ratings (1-5 stars) | 30 |
customers (1) ──→ (N) orders (1) ──→ (N) order_items (N) ──→ (1) products (N) ──→ (1) categories
│ │
└─────────────────────────────────────────────────────────────┘
reviews (N) ──→ (1)
Perfect for SQL beginners or refreshing fundamentals.
Topics Covered:
SELECT,FROM,WHEREbasics- Filtering with
=,<,>,LIKE,IN,BETWEEN - Sorting with
ORDER BYand limiting withLIMIT - Basic aggregations:
COUNT(),SUM(),AVG() - Pattern matching with wildcards
Example Exercises:
- Select all customers from a specific country
- Find products under $50
- Count total products in database
- Get top 5 most expensive items
Build on fundamentals with multi-table queries.
Topics Covered:
INNER JOIN,LEFT JOIN,RIGHT JOINGROUP BYwith aggregationsHAVINGclause for filtered groups- Multiple table joins
- Subqueries in
WHEREclause - Date filtering
Example Exercises:
- Total orders per customer with names
- Average price per category
- Products never reviewed
- Top 3 customers by spending
Master complex analytical queries.
Topics Covered:
- Window functions:
ROW_NUMBER(),RANK(),DENSE_RANK() PARTITION BYfor grouped analytics- Common Table Expressions (CTEs) with
WITH - Correlated subqueries
- Self-joins
- Complex multi-table analytics
Example Exercises:
- Rank products by revenue within each category
- Calculate running totals per customer
- Find products ordered together frequently
- Monthly sales analysis with CTEs
The system uses semantic similarity to understand not just IF you're wrong, but HOW WRONG you are:
# Vector embedding similarity calculation
similarity = calculate_query_similarity(your_query, solution)
# Returns: 0.873 (87.3% similar)
# Generates context-aware hints
if similarity > 0.8:
hint = "Almost perfect! Check column name: 'country' not 'location'"
elif similarity > 0.6:
hint = "Right approach! Missing the GROUP BY clause"
else:
hint = "Try focusing on JOIN syntax for multi-table queries"Never run out of practice! When you complete all exercises at a level:
# Automatically triggers
generate_new_exercise(level='beginner')
# LLM creates:
{
"question": "Find the average rating for products in the Electronics category",
"solution": "SELECT AVG(r.rating) FROM products p JOIN reviews r ON p.product_id = r.product_id JOIN categories c ON p.category_id = c.category_id WHERE c.category_name = 'Electronics';",
"concepts": ["JOIN", "AVG", "WHERE", "multi-table"],
"hint": "You'll need to join three tables: products, reviews, and categories"
}Exercise Completion Status:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Beginner: ████████████ 12/12 (100%) ✅
Intermediate: ████████░░░░ 8/12 (67%)
Advanced: ██░░░░░░░░░░ 2/12 (17%)
Smart API usage to save costs:
| Scenario | LLM Called? | Tokens Used |
|---|---|---|
| Correct answer | ❌ No | ~0 |
| Wrong answer (hint) | ✅ Yes | ~300-500 |
| Request solution | ✅ Yes | ~500 |
| Generate exercise | ✅ Yes | ~500 |
Average savings: 50-70% reduction in API costs!
$ python src/tutor.py
Loading semantic similarity model...
Model loaded successfully!
[Database structure shown]
[Example exercises shown]
Ready to start practicing?
Press Enter to continue or 'q' to quit:
Choose difficulty level:
1. Beginner
2. Intermediate
3. Advanced
Enter choice (1/2/3): 1
============================================================
SQL Practice Session - BEGINNER Level
============================================================
Question: Count how many customers are from Canada
Concepts: COUNT, WHERE
Enter your SQL query (end with semicolon):
> SELECT COUNT(*) FROM customers WHERE country = 'Canada';
✅ Correct! Great job!
Progress: 1/12 exercises completed at beginner level
Your results:
COUNT(*)
--------
2
Try another question? (y/n): yfrom src.tutor import SQLTutor
# Initialize tutor
tutor = SQLTutor('data/sample_database.db', 'data/queries.json')
# Get exercise
exercise = tutor.get_query_by_level('beginner')
print(f"Question: {exercise['question']}")
# Check answer
user_query = "SELECT * FROM customers WHERE country = 'USA';"
similarity = tutor.calculate_query_similarity(user_query, exercise['solution'])
print(f"Similarity: {similarity*100:.1f}%")
# Get hint if needed
if similarity < 0.9:
hint = tutor.get_progressive_hint(exercise, user_query, similarity, attempt=1)
print(f"Hint: {hint}")
# Validate answer
is_correct = tutor.compare_results(user_query, exercise['solution'])
print(f"Correct: {is_correct}")
tutor.close()Edit data/queries.json:
{
"beginner": [
{
"id": "b_custom1",
"question": "Find all orders placed in March 2024",
"solution": "SELECT * FROM orders WHERE order_date >= '2024-03-01' AND order_date < '2024-04-01';",
"concepts": ["SELECT", "WHERE", "date filtering"],
"hint": "Use date comparison with >= and <"
}
]
}Edit data/sample_database.sql to add tables, data, or create different scenarios:
-- Add a new table
CREATE TABLE suppliers (
supplier_id INTEGER PRIMARY KEY,
supplier_name VARCHAR(100),
country VARCHAR(50)
);
-- Insert data
INSERT INTO suppliers VALUES (1, 'TechSupply Inc', 'USA');Then recreate the database:
rm data/sample_database.db
python -c "import sqlite3; conn = sqlite3.connect('data/sample_database.db'); conn.executescript(open('data/sample_database.sql').read()); conn.close()"Edit src/tutor.py to customize:
Hint generation (line ~121):
prompt = f"""You are a friendly SQL tutor...
[customize tone, detail level, examples]
"""Exercise generation (line ~91):
prompt = f"""Create a {level} exercise that focuses on {specific_topic}...
"""❌ ModuleNotFoundError: No module named 'anthropic'
pip install -r requirements.txt❌ ANTHROPIC_API_KEY not found
# Check .env file exists and contains:
ANTHROPIC_API_KEY=sk-ant-...❌ Database not found
# Database should be auto-created. If not:
python -c "import sqlite3; conn = sqlite3.connect('data/sample_database.db'); conn.executescript(open('data/sample_database.sql').read()); conn.close()"❌ Model download slow
First run downloads ~90MB sentence transformer model. Subsequent runs are instant.
- Run in virtual environment for clean dependencies
- First exercise loads embedding model (~5 seconds)
- Use SSD for faster database queries
- Disable AI features for offline practice (uses template hints)
- OS: Windows 10, macOS 10.14+, Linux
- Python: 3.7 or higher
- RAM: 2 GB (4 GB recommended for AI features)
- Storage: 500 MB (including model cache)
- Internet: Required for AI features
anthropic>=0.40.0 # Claude AI SDK
python-dotenv>=1.0.0 # Environment management
sentence-transformers>=2.2.0 # Semantic similarity
numpy>=1.24.0 # Vector math
- 📖 Read the cheatsheet first - Skim
src/cheatsheet.mdfor SQL syntax reference - 🎯 Start with beginner - Even if experienced, get familiar with the database
- 🧪 Experiment freely - Try different approaches, learn from mistakes
- 💡 Use hints wisely - Try once without hints, then use them to learn
- 📊 Check similarity scores - They show you how close you are
- 🔄 Practice regularly - 10-15 minutes daily beats 2-hour weekend sessions
- 🤖 Learn from AI feedback - Understand WHY your answer was close but not exact
- ✅ Complete all levels - Don't skip to advanced too quickly
- Progress persistence - Save completion across sessions
- Spaced repetition - Smart scheduling of review exercises
- Difficulty customization - Adjust hint detail level
- Performance metrics - Track avg attempts, time per exercise
- Web interface - Browser-based version with syntax highlighting
- Multi-database support - PostgreSQL, MySQL connectors
- Team mode - Shared progress for classrooms
- Export progress - Generate completion certificates
- Voice hints - Optional audio explanations
- VSCode extension - Practice within your IDE
Contributions welcome! Areas for improvement:
- 📝 Add more exercises (especially intermediate/advanced)
- 🗄️ Create alternative database schemas (healthcare, finance, social media)
- 🌍 Translations (internationalization)
- 🎨 Improve hint quality with better prompts
- 🧪 Add unit tests
- 📚 Create video tutorials
- 🐛 Bug fixes and optimizations
How to contribute:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is provided as an educational tool for learning SQL.
API Usage: When using the Anthropic API, ensure compliance with Anthropic's Usage Policies.
Code: Feel free to modify, extend, and share for educational purposes.
- SQLite Documentation
- SQL Style Guide
- SQL Zoo - Additional practice
- Mode SQL Tutorial
CLAUDE.md- Original design specificationsrc/cheatsheet.md- Complete SQL referencedata/queries.json- All exercise definitions
Need help?
- 📖 Check the Troubleshooting section
- 📝 Review
src/cheatsheet.mdfor SQL syntax - 🔍 Read exercise hints carefully
- 💬 Open an issue on GitHub
Found a bug? Please report with:
- Python version (
python --version) - Error message
- Steps to reproduce
Built with:
- Claude AI by Anthropic
- Sentence Transformers for semantic similarity
- SQLite for the database engine
🚀 Happy Learning! Master SQL through intelligent practice.
Made with ❤️ for SQL learners everywhere