Skip to content

mac2bua/ai-engineering-examples

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI Engineering Examples

A collection of practical AI engineering examples, demos, and proof-of-concepts for modern AI applications. This repository serves as a playground for experimenting with cutting-edge AI techniques and preparing for AI engineering roles.

🚀 Quick Start

# Clone the repository
git clone <repository-url>
cd ai-engineering-examples

# Create and activate virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements/requirements.txt

# Run examples
python examples/speculative_decoding/improved_speculative_decoding.py

# Run interactive demo
streamlit run demos/speculative_decoding_demo.py

📁 Repository Structure

ai-engineering-examples/
├── examples/                    # Production-ready AI engineering examples
│   └── speculative_decoding/   # Speculative decoding implementations
│       ├── improved_speculative_decoding.py
│       └── README.md
├── demos/                      # Interactive demos and visualizations
│   └── speculative_decoding_demo.py
├── playground/                 # Learning & experimentation sandbox
│   ├── pythonic_examples.py   # Pythonic programming exercises
│   └── README.md
├── requirements/               # Dependency management
│   └── requirements.txt
├── venv/                      # Virtual environment (not tracked)
├── .gitignore                 # Ignore unnecessary files
└── README.md                  # This file

🎯 Examples

1. Speculative Decoding

Location: examples/speculative_decoding/

A comprehensive implementation of speculative decoding, a technique that uses a smaller "draft" model to predict tokens that are then verified by a larger "target" model. This can significantly speed up text generation.

Features:

  • ✅ Multiple model configurations (same model, different models)
  • ✅ Caching for performance optimization
  • ✅ Sophisticated rejection strategies
  • ✅ Memory usage monitoring
  • ✅ Performance comparison with standard decoding
  • ✅ GPU support

Usage:

python examples/speculative_decoding/improved_speculative_decoding.py

Key Insights:

  • Acceptance rates vary by model combination
  • Aggressive speculation (5 tokens) achieved highest throughput: 11.03 tokens/sec
  • Different models show realistic rejection patterns
  • Memory overhead: ~111MB vs standard decoding

2. Interactive Demo with Real-Time Visualization

Location: demos/speculative_decoding_demo.py

A Streamlit-based interactive demo that visualizes the speculative decoding process in real-time with live token generation and performance metrics.

🎨 Real-Time Features:

  • Live Token Generation: Watch tokens being generated step-by-step as they happen
  • Color-Coded Tokens:
    • 🟢 Green: Accepted tokens (✓)
    • 🔴 Red: Rejected tokens (✗) with replacement shown
    • 🟡 Yellow: Replacement tokens from target model
  • Real-Time Metrics: Live updates of elapsed time, tokens generated, and tokens/sec
  • Performance Comparison: Side-by-side comparison with standard decoding
  • Interactive Parameter Tuning: Adjust temperature, speculative tokens, and model combinations
  • Memory Usage Monitoring: Real-time resource tracking

Usage:

streamlit run demos/speculative_decoding_demo.py

Note:

  • The speculative decoding demo is under active improvement. If you have suggestions for better real-time visualization, usability, or want to contribute, please open an issue or pull request!
  • The demo now features improved color contrast, real-time token updates, and detailed step-by-step token visualization. However, further enhancements are welcome.

🎮 Playground

Location: playground/

A dedicated space for learning, experimentation, and practice. This includes:

  • Pythonic Programming Exercises: Solutions to Educative.io's "Pythonic Programming Tips for Software Engineers" course
  • Future Practice Files: Additional learning exercises and experiments
  • Safe Experimentation: Try new features without affecting production examples

Usage:

# Run Pythonic examples
python playground/pythonic_examples.py

# Add new practice files
touch playground/new_practice.py

🛠️ Technologies Used

  • PyTorch: Deep learning framework
  • Transformers: Hugging Face model library
  • Streamlit: Interactive web applications
  • Gradio: Alternative demo framework
  • Plotly: Interactive visualizations
  • psutil: System monitoring

🎨 Demo Features

Real-Time Speculative Decoding Demo

The interactive demo showcases:

  1. Live Token Visualization: Watch tokens being generated in real-time with color coding
  2. Performance Comparison: Side-by-side comparison with standard decoding
  3. Parameter Tuning: Adjust temperature, speculative tokens, and model combinations
  4. Visual Metrics: Charts showing acceptance rates, speed, and memory usage
  5. Model Selection: Choose from different model combinations
  6. Real-Time Metrics: Live updates of generation progress and performance

Color-Coded Token Display

  • 🟢 Accepted Tokens: Green with checkmark (✓) - Draft model prediction was correct
  • 🔴 Rejected Tokens: Red with X (✗) - Draft model prediction was wrong, replaced with target model's prediction
  • 🟡 Replacement Tokens: Yellow - Target model's replacement for rejected draft token

🚀 Getting Started for AI Engineering

For Interviews and POCs

  1. Study the Examples: Understand the implementation patterns
  2. Run the Demos: See the techniques in action with real-time visualization
  3. Modify Parameters: Experiment with different configurations
  4. Add New Examples: Extend with your own AI engineering patterns

For Production Use

  1. Optimize for Scale: Consider using vLLM for production inference
  2. Add Monitoring: Implement proper logging and metrics
  3. Error Handling: Add robust error handling for production environments
  4. Testing: Add unit tests and integration tests

📊 Performance Benchmarks

Speculative Decoding Results

Configuration Acceptance Rate Tokens/sec Memory (MB) Speedup
Same Model 100% 8.75 1507.6 1.0x
Different Models 95.56% 6.08 2510.5 0.7x
Conservative (1 token) 100% 8.34 2474.1 0.95x
Aggressive (5 tokens) 100% 11.03 3330.7 1.26x

🔧 Development

Adding New Examples

  1. Create a new directory in examples/
  2. Add your implementation
  3. Include a README explaining the technique
  4. Add any dependencies to requirements/requirements.txt
  5. Consider creating a demo in demos/

Best Practices

  • Documentation: Always include clear documentation
  • Performance: Monitor and optimize for performance
  • Visualization: Include visualizations when possible
  • Modularity: Keep code modular and reusable
  • Testing: Add tests for critical functionality

🤝 Contributing

This repository is designed for learning and experimentation. Feel free to:

  • Add new AI engineering examples
  • Improve existing implementations
  • Create new interactive demos
  • Suggest new techniques to explore

📚 Learning Resources

🎯 Use Cases

This repository is perfect for:

  • AI Engineering Interviews: Demonstrate practical knowledge with real-time demos
  • POC Development: Rapid prototyping of AI features
  • Learning: Understanding modern AI techniques with visual feedback
  • Research: Experimenting with new approaches
  • Teaching: Visual demonstrations of AI concepts

📄 License

This project is for educational and research purposes. Please respect the licenses of the underlying libraries and models used.


Happy AI Engineering! 🚀

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages