Skip to content

marsninja/CompilersForAIClass2025

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 

Repository files navigation

⚡ Compilers and Runtimes for AI: From Prompts to Accelerators

EECS 598 · Fall 2025

Cutting Edge and Emerging Technologies from the Programming Interface down to Hardware Acceleration of AI


📖 Course Summary

The science and art of creating efficient AI systems spans the entire computing stack—from high-level language abstractions down to specialized hardware accelerators. This course provides a comprehensive exploration of AI compiler and runtime techniques, covering everything from language-level AI compiler systems (DSPy, SGLang, MTP, Guidance, LMQL) to hardware-level acceleration (GPU kernels, TPU compilation, custom ASICs, and emerging AI chips).

Students will learn how modern AI compilers and runtime systems like PyTorch, JAX, TVM, TensorRT, VLLM, and specialized LLM compilers orchestrate the full pipeline from prompt engineering and program synthesis down to optimized execution on heterogeneous hardware. The course covers the complete spectrum: prompt-level optimizations, graph-level transformations, kernel-level tuning, memory hierarchy optimization, and distributed system coordination.

What You’ll Do

  • Explore language-level AI compiler techniques (prompt optimization, program synthesis, declarative AI programming) and traditional compiler optimizations (graph-level transformations, kernel tuning, memory management, distributed training)
  • Work hands-on with cutting-edge AI compiler ecosystems (DSPy, SGLang, MTP, Guidance, LMQL, MLIR, TVM, PyTorch, CUDA) and heterogeneous hardware platforms (GPUs, TPUs, custom accelerators, emerging AI chips)
  • Identify and focus on a specific research project within the scope of these technologies, demonstrating novel compiler/runtime optimizations for targeted AI workloads
  • Present a capstone project that delves deep into a particular aspect of AI compilation, from language-level innovations to hardware-level breakthroughs

Projects teams will be on the smaller side (~2–3 students) and will include selecting a focused research direction, designing targeted optimization approaches, building specialized compiler/runtime components, and benchmarking performance improvements.

What You’ll Learn

  • The comprehensive landscape of AI systems: from language-level AI compiler techniques (DSPy, SGLang, MTP) to hardware-level acceleration (GPU kernels, TPU compilation, custom ASICs)
  • State-of-the-art techniques: prompt-level optimization, program synthesis, graph-level transformation, auto-tuning, quantization, inference acceleration, and emerging AI chip architectures
  • Critical research skills: interpreting papers, evaluating cutting-edge systems, presenting technical ideas, and bridging the gap between high-level AI programming and low-level hardware optimization

Grading is research project-centric. You’ll showcase your project’s evolution through presentations, paper reviews, and final demos.


👨‍🏫 Instruction Team


🗓 Logistics

  • Lecture: TBD
  • Credits: 4
  • Office Hours: On Demand
  • GSI Office Hours: TBA
  • Course Discussion: Piazza (TBD)
  • Canvas: TBD
  • Recorded Lectures: Available on Canvas

📅 Schedule

Week Topics Description Notes/Links
Aug 25-27 Course Introduction & Overview
Introduction to Compilers for AI
Lecture
Sep 1-3 Labor Day (Holiday)
Foundations of AI Compiler Systems
Lecture
Sep 8-10 [Open]
DSPy, TVM
Papers and Discussion
Sep 15-17 MTP, Relay
GEPA, Ansor
Papers and Discussion
Sep 22-24 Pytorch2, TorchBench
TorchTitan, ECLIP
Papers and Discussion
Sep 29 - Oct 1 Triton, Geak
OpFusion, MemSafeXLA
Papers and Discussion
Oct 6-8 Group Presentations
Group Presentations
Pitches
Oct 13-15 Fall Study Break (Holiday)
MLIR, Glow
Papers and Discussion
Oct 20-22 [Repo Deconstruct]
[Repo Deconstruct]
Tech Talks
Oct 27-29 EffPagedAttn, EffLLMServ
NvidiaAmpere, AMDsDTW
Papers and Discussion
Nov 3-5 Group Presentations
Group Presentations
Updates
Nov 10-12 TPUs, MTIA
MLFleet + [Special Guest?]
Papers and Discussion
Nov 17-19 [Repo Deconstruct]
[Repo Deconstruct]
Papers and Discussion
Nov 24-26 Flex Day
Thanksgiving Recess (Holiday)
Presentations
Dec 1-3 Final Project Presentations
Course Wrap-up & Future Directions
Presentations

📂 Tentative List of Papers

# Technology Category Paper Title Year Link
1 LMQL Prompting Is Programming: A Query Language for Large Language Models 2022 Paper
2 DSPy DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines 2023 DSPy
3 DSPy Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs 2024 Paper
4 DSPy GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning 2025 GEPA
5 SGLang SGLang: Efficient Execution of Structured Language Model Programs 2023 Paper
6 MTP Meaning-Typed Programming: Language Abstraction and Runtime for Model-Integrated Applications 2025 MTP
7 Apache TVM TVM: An Automated End-to-End Optimizing Compiler for Deep Learning 2018 TVM
8 Apache TVM Relay: A High-Level Compiler for Deep Learning 2019 Relay
9 Apache TVM Ansor: Generating High-Performance Tensor Programs for Deep Learning 2020 Ansor
10 PyTorch 2 PyTorch 2: Faster Machine Learning Through Dynamic Python Bytecode and Graph Compilation for DNNs 2024 Pytorch2
11 PyTorch 2 TorchBench: Benchmarking PyTorch with High API Surface Coverage 2023 TorchBench
12 PyTorch 2 TorchTitan: One-stop PyTorch native solution for production ready LLM pretraining 2024 TorchTitan
13 PyTorch ROCm ECLIP: Energy-efficient and Practical Co-Location of ML Inference Pipelines on GPUs 2025 ECLIP
14 Triton Triton: An Intermediate Language and Compiler for Tiled Neural Network Computations 2019 Triton
15 Triton Geak: Introducing Triton Kernel AI Agent & Evaluation Benchmarks 2025 Geak
16 OpenXLA Operator Fusion in XLA: Analysis and Evaluation 2023 OpFusion
17 OpenXLA Memory Safe Computations with XLA Compiler 2022 MemSafeXLA
18 Google MLIR MLIR: A Compiler Infrastructure for the End of Moore's Law 2020 MLIR
19 Meta Glow Glow: Graph Lowering Compiler Techniques for Neural Networks 2018 Glow
20 vLLM Efficient Memory Management for Large Language Model Serving with PagedAttention 2023 EffPagedAttn
21 vLLM Effective Memory Management for Serving LLM with Heterogeneity 2025 EffLLMServ
22 GPU ISA & Architecture Demystifying the Nvidia Ampere Architecture through Microbenchmarking and Instruction-level Analysis 2023 NvidiaAmpere
23 GPU ISA & Architecture Cambricon: An Instruction Set Architecture for Neural Networks 2016 Cambricon
24 CUDA/ROCm Optimizing sDTW for AMD GPUs 2024 AMDsDTW
25 TPU ISA & Architecture TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings 2023 TPUs
26 TPU ISA & Architecture MTIA: First Generation Silicon Targeting Meta's Recommendation Systems 2023 MTIA
27 TPU ISA & Architecture Machine Learning Fleet Efficiency: Analyzing and Optimizing Large-Scale Google TPU Systems with ML Productivity Goodput 2016 MLFleet
28 OpenVINO OpenVINO Deep Learning Workbench: Comprehensive Analysis and Tuning of Neural Networks Inference 2019 Paper
29 OpenVINO Leveraging Speculative Sampling and KV-Cache Optimizations Together for Generative AI using OpenVINO 2023 Paper
30 Intel PlaidML Stripe: Tensor Compilation via the Nested Polyhedral Model 2019 Paper

Need to add some compound AI papers, Graphine, and a few others


📊 Grading

This is a very 'do based' course, we'll be learning, creating, innovating and sharing. A significant portion of the grade is allocated to the research project. Most (if not all) does very well in this class as long as you stay engaged on the journey. Lets create some amazing stuff! 😊

Research Project: 80%

  • Project Pitch: 10%
  • Project Update: 10%
  • Video Lightning Talk: 15%
  • Final Presentation: 20%
  • Paper Write-Up: 25%

Participation, Impact, and Engagement: 20%

  • Paper Presentations: 10%
  • Repo Deconstructions: 5%
  • Paper Vibe Logs: 5% / #No of Papers

🎯 How We'll Work

This course emphasizes collaborative learning and knowledge sharing. Students will actively participate in presentations that showcase their understanding and discoveries from the course materials.

Grading and Logistics

Grading Basis

  • Grading will be based on each student's GitHub repository for this class. Your repo is the official record of your work and deliverables.

Repository Registration

Paper Vibe Logs

  • Upload your paper vibe logs to your GitHub repository.
  • Vibe logs are due the same day as the paper being presented.
  • Each student submits their own vibe log (no group submissions).
  • Each vibe log should contain about 5–10 questions you asked to GPT about the paper being presented.

"Vibe Learning" Log

A key component of our learning methodology is the "vibe learning" presentation. In these sessions, students will:

  • Walk through AI conversation logs: Present a curated log of their conversations with AI systems where they learned interesting concepts from the course papers
  • Teach back to the class: Use their AI interaction logs as a foundation to explain complex compiler and runtime concepts to their peers
  • Demonstrate understanding: Show how they've internalized and can communicate technical concepts through their AI-assisted learning journey
  • Share insights: Highlight unexpected discoveries, connections, and "aha moments" that emerged from their AI conversations

This approach leverages the power of AI as a learning companion while ensuring students develop deep understanding through the process of teaching others. Students will learn not just from the papers themselves, but from each other's unique learning paths and AI-assisted discoveries.

Student Presentations

Students should sign up for a paper here (in the paper signup tab): https://docs.google.com/spreadsheets/d/1y7yw2zQt6hjsVg0bTg0fLS1cplkk9_-1nd-qYPybwfE/edit?usp=sharing

Presentation Slides

  • Aim for 15–20 slides
  • Cover the paper’s motivation, problem, key ideas/methodology, system/architecture, evaluation/results, limitations/trade-offs, and key takeaways
  • Include relevant figures/diagrams/tables from the paper (or simplified redraws) to clearly tell the story
  • Use a clear narrative: problem → idea → how it works → why it’s better → evidence → implications
  • For style and pacing, watch recent paper talks on YouTube from top architecture/systems conferences such as ISCA, MICRO, ASPLOS, and HPCA

🛠 Notes


Prepare to build the next generation of compilers and runtimes for AI.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published