Here is a collection of valuable machine learning / deep learning resources, including courses, publications, and projects. Quick notes and highlights of the materials are annotated if possible.
- CS 329S: Machine Learning Systems Design (Stanford'21)
- CS 294: Machine Learning Systems (Fall 2019) (UCBRISE'19)
- EECS 598: Systems for AI (W'21) (hosted by Mosharaf Chowdhury)
very good collections of concepts and key ideas (papers) of a wide area of this topic
- UNIT: Unifying Tensorized Instruction Compilation (CGO'21)
- taco: A Tool to Generate Tensor Algebra Kernels (ASE'17, following works, code)
- Sparse GPU Kernels for Deep Learning (SC20)
this paper presents practical and efficient techniques to optimize a general-purpose sparse kernel, esp. for SpMM.
- Tony Nowatzki (Assistant Professor of Computer Science @UCLA)
It is important to reduce the redundance of the over-parameterized DNN models before real deployment. Pruning (structure or unstructured) and quantization are widely adopted compressing methods. There are actually two problems to address in this area, how to identify the redundance (algorithmic) and how to leverage the redundance to speedup (systemic).
- MingSun-Tse/EfficientDNNs
Collection of recent methods on DNN compression and acceleration
- awesome-fast-attention
- Compressed Transformer
- SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning (HPCA'21)
This paper talks about both algorithmic optimization and the hardware architecture fitted to it. The proposed algorithms includes pruning (Cascade Token/Head Pruning) and quantization (Progressive Quantization).
Recent works from Song Han's team
- SpArch: Efficient Architecture for Sparse Matrix Multiplication(HPCA'20)
- APQ: Joint Search for Network Architecture, Pruning and Quantization Policy (CVPR'20, code)
- Once for All: Train One Network and Specialize it for Efficient Deployment (ICLR'20, code)
- ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware (ICLR’19)
- AMC: AutoML for Model Compression and Acceleration on Mobile Devices (ECCV’18)
- HAQ: Hardware-Aware Automated Quantization (CVPR’19, oral)
- Defenstive Quantization: When Efficiency Meets Robustness (ICLR'19)
Building production level machine learning applications is much more than training a neural network model. This section collects the principles and methodologies as the technical / engineering guideline for it.
- Machine Learning Yearning (chinese version)
- Approaching (Almost) Any Machine Learning Problem
- Machine Learning Engineering for Production (MLOps) Specialization (Coursera by Andrew Ng, talk)
- EthicalML/awesome-production-machine-learning
A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning
- alirezadir/Production-Level-Deep-Learning
A guideline for building practical production-level deep learning systems to be deployed in real world applications
- Machine learning system design
A primer for machine learning system design interviews published in medium. The author lists some valuable question to ask when designing a practical ML system.
- Energy and Policy Considerations for Deep Learning in NLP
it firstly reports that a single deep learning model can generate up to 626,155 pounds of CO2 emissions—roughly equal to the total lifetime carbon footprint of five cars
- Green AI
- ML CO2 Impact