A collection of practical AI engineering examples, demos, and proof-of-concepts for modern AI applications. This repository serves as a playground for experimenting with cutting-edge AI techniques and preparing for AI engineering roles.
# Clone the repository
git clone <repository-url>
cd ai-engineering-examples
# Create and activate virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements/requirements.txt
# Run examples
python examples/speculative_decoding/improved_speculative_decoding.py
# Run interactive demo
streamlit run demos/speculative_decoding_demo.py
ai-engineering-examples/
├── examples/ # Production-ready AI engineering examples
│ └── speculative_decoding/ # Speculative decoding implementations
│ ├── improved_speculative_decoding.py
│ └── README.md
├── demos/ # Interactive demos and visualizations
│ └── speculative_decoding_demo.py
├── playground/ # Learning & experimentation sandbox
│ ├── pythonic_examples.py # Pythonic programming exercises
│ └── README.md
├── requirements/ # Dependency management
│ └── requirements.txt
├── venv/ # Virtual environment (not tracked)
├── .gitignore # Ignore unnecessary files
└── README.md # This file
Location: examples/speculative_decoding/
A comprehensive implementation of speculative decoding, a technique that uses a smaller "draft" model to predict tokens that are then verified by a larger "target" model. This can significantly speed up text generation.
Features:
- ✅ Multiple model configurations (same model, different models)
- ✅ Caching for performance optimization
- ✅ Sophisticated rejection strategies
- ✅ Memory usage monitoring
- ✅ Performance comparison with standard decoding
- ✅ GPU support
Usage:
python examples/speculative_decoding/improved_speculative_decoding.py
Key Insights:
- Acceptance rates vary by model combination
- Aggressive speculation (5 tokens) achieved highest throughput: 11.03 tokens/sec
- Different models show realistic rejection patterns
- Memory overhead: ~111MB vs standard decoding
Location: demos/speculative_decoding_demo.py
A Streamlit-based interactive demo that visualizes the speculative decoding process in real-time with live token generation and performance metrics.
🎨 Real-Time Features:
- Live Token Generation: Watch tokens being generated step-by-step as they happen
- Color-Coded Tokens:
- 🟢 Green: Accepted tokens (✓)
- 🔴 Red: Rejected tokens (✗) with replacement shown
- 🟡 Yellow: Replacement tokens from target model
- Real-Time Metrics: Live updates of elapsed time, tokens generated, and tokens/sec
- Performance Comparison: Side-by-side comparison with standard decoding
- Interactive Parameter Tuning: Adjust temperature, speculative tokens, and model combinations
- Memory Usage Monitoring: Real-time resource tracking
Usage:
streamlit run demos/speculative_decoding_demo.py
Note:
- The speculative decoding demo is under active improvement. If you have suggestions for better real-time visualization, usability, or want to contribute, please open an issue or pull request!
- The demo now features improved color contrast, real-time token updates, and detailed step-by-step token visualization. However, further enhancements are welcome.
Location: playground/
A dedicated space for learning, experimentation, and practice. This includes:
- Pythonic Programming Exercises: Solutions to Educative.io's "Pythonic Programming Tips for Software Engineers" course
- Future Practice Files: Additional learning exercises and experiments
- Safe Experimentation: Try new features without affecting production examples
Usage:
# Run Pythonic examples
python playground/pythonic_examples.py
# Add new practice files
touch playground/new_practice.py
- PyTorch: Deep learning framework
- Transformers: Hugging Face model library
- Streamlit: Interactive web applications
- Gradio: Alternative demo framework
- Plotly: Interactive visualizations
- psutil: System monitoring
The interactive demo showcases:
- Live Token Visualization: Watch tokens being generated in real-time with color coding
- Performance Comparison: Side-by-side comparison with standard decoding
- Parameter Tuning: Adjust temperature, speculative tokens, and model combinations
- Visual Metrics: Charts showing acceptance rates, speed, and memory usage
- Model Selection: Choose from different model combinations
- Real-Time Metrics: Live updates of generation progress and performance
- 🟢 Accepted Tokens: Green with checkmark (✓) - Draft model prediction was correct
- 🔴 Rejected Tokens: Red with X (✗) - Draft model prediction was wrong, replaced with target model's prediction
- 🟡 Replacement Tokens: Yellow - Target model's replacement for rejected draft token
- Study the Examples: Understand the implementation patterns
- Run the Demos: See the techniques in action with real-time visualization
- Modify Parameters: Experiment with different configurations
- Add New Examples: Extend with your own AI engineering patterns
- Optimize for Scale: Consider using vLLM for production inference
- Add Monitoring: Implement proper logging and metrics
- Error Handling: Add robust error handling for production environments
- Testing: Add unit tests and integration tests
Configuration | Acceptance Rate | Tokens/sec | Memory (MB) | Speedup |
---|---|---|---|---|
Same Model | 100% | 8.75 | 1507.6 | 1.0x |
Different Models | 95.56% | 6.08 | 2510.5 | 0.7x |
Conservative (1 token) | 100% | 8.34 | 2474.1 | 0.95x |
Aggressive (5 tokens) | 100% | 11.03 | 3330.7 | 1.26x |
- Create a new directory in
examples/
- Add your implementation
- Include a README explaining the technique
- Add any dependencies to
requirements/requirements.txt
- Consider creating a demo in
demos/
- Documentation: Always include clear documentation
- Performance: Monitor and optimize for performance
- Visualization: Include visualizations when possible
- Modularity: Keep code modular and reusable
- Testing: Add tests for critical functionality
This repository is designed for learning and experimentation. Feel free to:
- Add new AI engineering examples
- Improve existing implementations
- Create new interactive demos
- Suggest new techniques to explore
This repository is perfect for:
- AI Engineering Interviews: Demonstrate practical knowledge with real-time demos
- POC Development: Rapid prototyping of AI features
- Learning: Understanding modern AI techniques with visual feedback
- Research: Experimenting with new approaches
- Teaching: Visual demonstrations of AI concepts
This project is for educational and research purposes. Please respect the licenses of the underlying libraries and models used.
Happy AI Engineering! 🚀