This release significantly expands the project with new chapters on optimization algorithms, Vision Transformers (ViT), and additional PyTorch fundamentals. The accompanying dnnl library has also been extended with new neural network components and model implementations.
New Notebooks
Chapter 3: Multi-Layer Perceptron: From Single Layer to Deep Nonlinear Modeling
- 3.1 From Linear Classifiers to MLPs: Why We Need Hidden Layers
- 3.2 Activation Functions: Adding Nonlinearity to Neural Networks
- 3.3 Softmax and Cross Entropy: From Logits to Classification Loss
- 3.4 Forward and Backward Propagation of Linear Layers
- 3.5 Building a Complete MLP with NumPy
- 3.6 Train MLP on MNIST with NumPy
- 3.7 Backward Propagation Check: Using Numerical Gradients to Verify Handwritten Backward
- 3.8 Reimplementing MLP with PyTorch nn.Module
Chapter 4: Optimization Algorithms: How Neural Networks Update Parameters
- 4.1 From Gradient Descent to SGD
- 4.2 Momentum and Nesterov Momentum
- 4.3 Adagrad: Adapting the Learning Rate for Each Parameter
- 4.4 RMSprop and Adadelta: Fixing Adagrad's Learning-rate Decay
- 4.5 Adam: Combining Momentum and Adaptive Scaling
- 4.6 AdamW: Decoupling Weight Decay from Adam
- 4.7 Muon: Orthogonalizing Matrix Updates
- 4.8 Optimizer Map: When to Use Which Optimization Algorithm
- 4.9 Learning Rate Schedulers: How the Learning Rate Changes During Training
Chapter 11: Vision Transformer: From Image Classification to Visual Sequence Modeling
- 11.1 From CNN to Vision Transformer: Treating Images as Sequences
- 11.2 Patch Embedding: Cutting Images into Tokens
- 11.3 Class Token and Positional Embedding: Letting a Sequence Represent the Whole Image
- 11.4 ViT Encoder: Letting Patch Tokens Exchange Information
- 11.5 ViT Backbone: Pretraining and Fine-tuning
dnnl Package Updates
- Added NumPy-based implementations of common neural network building blocks, including linear layers, activation functions, loss functions, normalization layers, and optimizers.
- Added a complete NumPy MLP implementation with forward propagation, backpropagation, gradient checking, and MNIST training examples.
- Added Vision Transformer (ViT) components, including patch embedding, class tokens, positional embeddings, Transformer encoders, and classification heads.
- Expanded Transformer-related modules and improved interoperability between educational examples and reusable library code.
- Added optimizer implementations including SGD, momentum, Nesterov momentum, Adagrad, RMSprop, Adam, AdamW, and Muon.
- Added learning rate scheduler and optimizer-related utilities.
- Improved package organization and documentation across neural network, optimization, and vision-related modules.
- Expanded test coverage and examples for newly introduced models and optimization algorithms.
- Updated package metadata, dependencies, CI workflows, and development tooling.
CI Updates
- Migrated GitHub Actions workflows to use GitHub Artifact Attestations for build provenance and artifact verification.
- Replaced Quarto
_freezecaching with GitHub Actions cache to reduce repository size and improve CI performance. - Improved workflow reliability and build reproducibility across documentation and package pipelines.
Merged Pull Requests
- Bump numpy from 2.4.5 to 2.4.6 by @dependabot[bot] in #5
- Update transformers requirement from ~=5.8.0 to ~=5.9.0 by @dependabot[bot] in #8
- Fix view operations for q, k, v in multi-head attention by @kbyy123 in #11
- Fix typo in decoder explanation by @kbyy123 in #12
- [en] Fix formula rendering issues in ch1.3 based on CN version by @wqpwqp1222 in #13
- Update dependency gdown to >=6.1.0,<6.2.0 by @renovate[bot] in #16
- Update dependency scikit-learn to >=1.9.0,<1.10.0 by @renovate[bot] in #17
- Remove extra 'not' in zero_grad example code for both zh and en versions by @wqpwqp1222 in #18
- Update dependency transformers to >=5.10.1,<5.11.0 by @renovate[bot] in #19
- Update dependency datasets to v5 by @renovate[bot] in #20
- Update dependency diffusers to >=0.38.0,<0.39.0 by @renovate[bot] in #24
- Update dependency transformers to >=5.11.0,<5.12.0 by @renovate[bot] in #25
New Contributors
- @kbyy123 made their first contribution in #11
- @wqpwqp1222 made their first contribution in #13
- @renovate[bot] made their first contribution in #16
Note
This project continues to be maintained in both Chinese and English through a Quarto-based structure, as an open and continuously growing collection of deep learning study notes.
Full Changelog: v2026.05.09...v2026.06.11