Cutting Edge and Emerging Technologies from the Programming Interface down to Hardware Acceleration of AI
The science and art of creating efficient AI systems spans the entire computing stack—from high-level language abstractions down to specialized hardware accelerators. This course provides a comprehensive exploration of AI compiler and runtime techniques, covering everything from language-level AI compiler systems (DSPy, SGLang, MTP, Guidance, LMQL) to hardware-level acceleration (GPU kernels, TPU compilation, custom ASICs, and emerging AI chips).
Students will learn how modern AI compilers and runtime systems like PyTorch, JAX, TVM, TensorRT, VLLM, and specialized LLM compilers orchestrate the full pipeline from prompt engineering and program synthesis down to optimized execution on heterogeneous hardware. The course covers the complete spectrum: prompt-level optimizations, graph-level transformations, kernel-level tuning, memory hierarchy optimization, and distributed system coordination.
- Explore language-level AI compiler techniques (prompt optimization, program synthesis, declarative AI programming) and traditional compiler optimizations (graph-level transformations, kernel tuning, memory management, distributed training)
- Work hands-on with cutting-edge AI compiler ecosystems (DSPy, SGLang, MTP, Guidance, LMQL, MLIR, TVM, PyTorch, CUDA) and heterogeneous hardware platforms (GPUs, TPUs, custom accelerators, emerging AI chips)
- Identify and focus on a specific research project within the scope of these technologies, demonstrating novel compiler/runtime optimizations for targeted AI workloads
- Present a capstone project that delves deep into a particular aspect of AI compilation, from language-level innovations to hardware-level breakthroughs
Projects teams will be on the smaller side (~2–3 students) and will include selecting a focused research direction, designing targeted optimization approaches, building specialized compiler/runtime components, and benchmarking performance improvements.
- The comprehensive landscape of AI systems: from language-level AI compiler techniques (DSPy, SGLang, MTP) to hardware-level acceleration (GPU kernels, TPU compilation, custom ASICs)
- State-of-the-art techniques: prompt-level optimization, program synthesis, graph-level transformation, auto-tuning, quantization, inference acceleration, and emerging AI chip architectures
- Critical research skills: interpreting papers, evaluating cutting-edge systems, presenting technical ideas, and bridging the gap between high-level AI programming and low-level hardware optimization
Grading is research project-centric. You’ll showcase your project’s evolution through presentations, paper reviews, and final demos.
- Instructor: Jason Mars (📧 profmars@umich.edu)
- GSI: TBD
- Lecture: TBD
- Credits: 4
- Office Hours: On Demand
- GSI Office Hours: TBA
- Course Discussion: Piazza (TBD)
- Canvas: TBD
- Recorded Lectures: Available on Canvas
Week | Topics | Description | Notes/Links |
---|---|---|---|
Aug 25-27 | Course Introduction & Overview Introduction to Compilers for AI |
Lecture | |
Sep 1-3 | Labor Day (Holiday) Foundations of AI Compiler Systems |
Lecture | |
Sep 8-10 | [Open] DSPy, TVM |
Papers and Discussion | |
Sep 15-17 | MTP, Relay GEPA, Ansor |
Papers and Discussion | |
Sep 22-24 | Pytorch2, TorchBench TorchTitan, ECLIP |
Papers and Discussion | |
Sep 29 - Oct 1 | Triton, Geak OpFusion, MemSafeXLA |
Papers and Discussion | |
Oct 6-8 | Group Presentations Group Presentations |
Pitches | |
Oct 13-15 | Fall Study Break (Holiday) MLIR, Glow |
Papers and Discussion | |
Oct 20-22 | [Repo Deconstruct] [Repo Deconstruct] |
Tech Talks | |
Oct 27-29 | EffPagedAttn, EffLLMServ NvidiaAmpere, AMDsDTW |
Papers and Discussion | |
Nov 3-5 | Group Presentations Group Presentations |
Updates | |
Nov 10-12 | TPUs, MTIA MLFleet + [Special Guest?] |
Papers and Discussion | |
Nov 17-19 | [Repo Deconstruct] [Repo Deconstruct] |
Papers and Discussion | |
Nov 24-26 | Flex Day Thanksgiving Recess (Holiday) |
Presentations | |
Dec 1-3 | Final Project Presentations Course Wrap-up & Future Directions |
Presentations |
# | Technology Category | Paper Title | Year | Link |
---|---|---|---|---|
1 | LMQL | Prompting Is Programming: A Query Language for Large Language Models | 2022 | Paper |
2 | DSPy | DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines | 2023 | DSPy |
3 | DSPy | Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs | 2024 | Paper |
4 | DSPy | GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning | 2025 | GEPA |
5 | SGLang | SGLang: Efficient Execution of Structured Language Model Programs | 2023 | Paper |
6 | MTP | Meaning-Typed Programming: Language Abstraction and Runtime for Model-Integrated Applications | 2025 | MTP |
7 | Apache TVM | TVM: An Automated End-to-End Optimizing Compiler for Deep Learning | 2018 | TVM |
8 | Apache TVM | Relay: A High-Level Compiler for Deep Learning | 2019 | Relay |
9 | Apache TVM | Ansor: Generating High-Performance Tensor Programs for Deep Learning | 2020 | Ansor |
10 | PyTorch 2 | PyTorch 2: Faster Machine Learning Through Dynamic Python Bytecode and Graph Compilation for DNNs | 2024 | Pytorch2 |
11 | PyTorch 2 | TorchBench: Benchmarking PyTorch with High API Surface Coverage | 2023 | TorchBench |
12 | PyTorch 2 | TorchTitan: One-stop PyTorch native solution for production ready LLM pretraining | 2024 | TorchTitan |
13 | PyTorch ROCm | ECLIP: Energy-efficient and Practical Co-Location of ML Inference Pipelines on GPUs | 2025 | ECLIP |
14 | Triton | Triton: An Intermediate Language and Compiler for Tiled Neural Network Computations | 2019 | Triton |
15 | Triton | Geak: Introducing Triton Kernel AI Agent & Evaluation Benchmarks | 2025 | Geak |
16 | OpenXLA | Operator Fusion in XLA: Analysis and Evaluation | 2023 | OpFusion |
17 | OpenXLA | Memory Safe Computations with XLA Compiler | 2022 | MemSafeXLA |
18 | Google MLIR | MLIR: A Compiler Infrastructure for the End of Moore's Law | 2020 | MLIR |
19 | Meta Glow | Glow: Graph Lowering Compiler Techniques for Neural Networks | 2018 | Glow |
20 | vLLM | Efficient Memory Management for Large Language Model Serving with PagedAttention | 2023 | EffPagedAttn |
21 | vLLM | Effective Memory Management for Serving LLM with Heterogeneity | 2025 | EffLLMServ |
22 | GPU ISA & Architecture | Demystifying the Nvidia Ampere Architecture through Microbenchmarking and Instruction-level Analysis | 2023 | NvidiaAmpere |
23 | GPU ISA & Architecture | Cambricon: An Instruction Set Architecture for Neural Networks | 2016 | Cambricon |
24 | CUDA/ROCm | Optimizing sDTW for AMD GPUs | 2024 | AMDsDTW |
25 | TPU ISA & Architecture | TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings | 2023 | TPUs |
26 | TPU ISA & Architecture | MTIA: First Generation Silicon Targeting Meta's Recommendation Systems | 2023 | MTIA |
27 | TPU ISA & Architecture | Machine Learning Fleet Efficiency: Analyzing and Optimizing Large-Scale Google TPU Systems with ML Productivity Goodput | 2016 | MLFleet |
28 | OpenVINO | OpenVINO Deep Learning Workbench: Comprehensive Analysis and Tuning of Neural Networks Inference | 2019 | Paper |
29 | OpenVINO | Leveraging Speculative Sampling and KV-Cache Optimizations Together for Generative AI using OpenVINO | 2023 | Paper |
30 | Intel PlaidML | Stripe: Tensor Compilation via the Nested Polyhedral Model | 2019 | Paper |
Need to add some compound AI papers, Graphine, and a few others
This is a very 'do based' course, we'll be learning, creating, innovating and sharing. A significant portion of the grade is allocated to the research project. Most (if not all) does very well in this class as long as you stay engaged on the journey. Lets create some amazing stuff! 😊
Research Project: 80%
- Project Pitch: 10%
- Project Update: 10%
- Video Lightning Talk: 15%
- Final Presentation: 20%
- Paper Write-Up: 25%
Participation, Impact, and Engagement: 20%
- Paper Presentations: 10%
- Repo Deconstructions: 5%
- Paper Vibe Logs: 5% / #No of Papers
This course emphasizes collaborative learning and knowledge sharing. Students will actively participate in presentations that showcase their understanding and discoveries from the course materials.
Grading Basis
- Grading will be based on each student's GitHub repository for this class. Your repo is the official record of your work and deliverables.
Repository Registration
- Each student must enter their U-M uniquename and GitHub repository link in the "GitHub repos" tab of the same Google Sheet used for student presentations (https://docs.google.com/spreadsheets/d/1y7yw2zQt6hjsVg0bTg0fLS1cplkk9_-1nd-qYPybwfE/edit?usp=sharing).
Paper Vibe Logs
- Upload your paper vibe logs to your GitHub repository.
- Vibe logs are due the same day as the paper being presented.
- Each student submits their own vibe log (no group submissions).
- Each vibe log should contain about 5–10 questions you asked to GPT about the paper being presented.
A key component of our learning methodology is the "vibe learning" presentation. In these sessions, students will:
- Walk through AI conversation logs: Present a curated log of their conversations with AI systems where they learned interesting concepts from the course papers
- Teach back to the class: Use their AI interaction logs as a foundation to explain complex compiler and runtime concepts to their peers
- Demonstrate understanding: Show how they've internalized and can communicate technical concepts through their AI-assisted learning journey
- Share insights: Highlight unexpected discoveries, connections, and "aha moments" that emerged from their AI conversations
This approach leverages the power of AI as a learning companion while ensuring students develop deep understanding through the process of teaching others. Students will learn not just from the papers themselves, but from each other's unique learning paths and AI-assisted discoveries.
Students should sign up for a paper here (in the paper signup tab): https://docs.google.com/spreadsheets/d/1y7yw2zQt6hjsVg0bTg0fLS1cplkk9_-1nd-qYPybwfE/edit?usp=sharing
- Aim for 15–20 slides
- Cover the paper’s motivation, problem, key ideas/methodology, system/architecture, evaluation/results, limitations/trade-offs, and key takeaways
- Include relevant figures/diagrams/tables from the paper (or simplified redraws) to clearly tell the story
- Use a clear narrative: problem → idea → how it works → why it’s better → evidence → implications
- For style and pacing, watch recent paper talks on YouTube from top architecture/systems conferences such as ISCA, MICRO, ASPLOS, and HPCA
- https://medium.com/geekculture/ai-compilers-ae28afbc4907
- https://developer.nvidia.com/blog/understanding-ptx-the-assembly-language-of-cuda-gpu-computing/
- https://rocm.blogs.amd.com/software-tools-optimization/amdgcn-isa/README.html
⭐ Prepare to build the next generation of compilers and runtimes for AI.