Skip to content

Commit 3e277b2

Browse files
committed
fix typos and add some link
1 parent 9688fdb commit 3e277b2

File tree

1 file changed

+8
-8
lines changed

1 file changed

+8
-8
lines changed

docs/auto-diff-tutorial-2.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -9,11 +9,11 @@ intro_image_hide_on_mobile: false
99

1010
## What Will We Learn
1111

12-
In the last tutorial, we explain basic knowledge about Slang's autodiff feature and introduce some advanced techniques about custom derivative implementation and custom differential type.
12+
In the [last tutorial](auto-diff-tutorial-1.md), we explain basic knowledge about Slang's autodiff feature and introduce some advanced techniques about custom derivative implementation and custom differential type.
1313

1414
In this tutorial, we will walk through a real-world example to combine what we've learnt from the last tutorial.
1515

16-
We will learn how to implement a tiny Multi-Layer Perceptron (MLP) to approximate a set of polynomial functions using Slang\'s automatic differentiation capabilities.
16+
We will learn how to implement a tiny Multi-Layer Perceptron (MLP) to approximate a set of polynomial functions using Slang's automatic differentiation capabilities.
1717

1818
### Problem Setup
1919

@@ -78,7 +78,7 @@ public struct MyNetwork
7878
}
7979
```
8080

81-
In our interface methods design, we introduce an internal representation of vector type `MLVec<4>` in the network evaluation. And this design choice will help us hide the details about how we are going to perform the arithmetic/logic operations on the vector type, as they are irrelevant to how the network should be designed. For now, you can treat this type as an opaque type, and we will talk about the detail in later section. Therefore, in the public method, we will just box the input to `MLVec`, call internal `_eval` method, and unbox MLVec to normal vector. And in the internal `_eval` method, we will just chain the evaluations of each layer together.
81+
In our interface methods design, we introduce an internal representation of vector type `MLVec<4>` in the network evaluation. And this design choice will help us hide the details about how we are going to perform the arithmetic/logic operations on the vector type, as they are irrelevant to how the network should be designed. For now, you can treat this type as an opaque type, and we will talk about the detail in later section. Therefore, in the public method, we will just box the input to `MLVec`, call internal `_eval` method, and unbox `MLVec` to normal vector. And in the internal `_eval` method, we will just chain the evaluations of each layer together.
8282

8383
### 3.2 Feed-Forward Layer Definition
8484

@@ -176,7 +176,7 @@ After implementing the forward pass of the network evaluation, we will need to i
176176

177177
### 4.1 Backward Pass of Loss
178178

179-
To implement the backward derivative of loss function, we only need to mark the function is `[Differentiable]` as we learnt from last tutorial:
179+
To implement the backward derivative of loss function, we only need to mark the function is `[Differentiable]` as we learnt from [last tutorial](auto-diff-tutorial-1.md)
180180

181181
```hlsl
182182
[Differentiable]
@@ -194,7 +194,7 @@ And from the slang kernel function, we will just need to call the backward of lo
194194
bwd_diff(loss)(network, input.x, input.y, 1.0h);
195195
```
196196

197-
One important thing to notice is that we are using no_diff attribute to decorate the input x, y and **groudtruth** calculation, because in the backward pass, we only care about the result of $\frac{\partial loss}{\partial\theta}$. no_diff attribute just tells Slang to treat the variables or instructions as non-differentiable, so there will be no backward mode instructions generated for those variables or instructions. In this case, since we don't care about the derivative of loss function with respective of input, therefore we can safely mark them as non-differentiable.
197+
One important thing to notice is that we are using no_diff attribute to decorate the input `x`, `y` and `groudtruth` calculation, because in the backward pass, we only care about the result of $\frac{\partial loss}{\partial\theta}$. `no_diff` attribute just tells Slang to treat the variables or instructions as non-differentiable, so there will be no backward mode instructions generated for those variables or instructions. In this case, since we don't care about the derivative of loss function with respective of input, therefore we can safely mark them as non-differentiable.
198198

199199
This implementation indicates that this call `network.eval(x, y);` must be differentiable, so next we are going to implement the backward pass for this method.
200200

@@ -225,7 +225,7 @@ public struct MyNetwork
225225

226226
## 4.3 Custom Backward Pass for Layers
227227

228-
Following the propagation direction of the gradients, we will next implement the backward derivative of FeedForwardLayer. But we're going to do something different. Instead of asking Slang to automatically synthesize the backward autodiff for us, we will provide a custom derivative implementation. Because the network parameters and gradients are a buffer storage declared in the layer, we will have to provide a custom derivative to write the gradient back to the global buffer storage, you can reference \[**How to propagate derivatives to global buffer storage\ ** in last tutorial to refresh your memory. Another reason is that our layer is just matrix multiplication with bias, and its derivative is quite simple, and there are lots of options to even accelerate it with specific hardware (e.g. Nvidia tensor core). Therefore, it's good practice to implement the custom derivative.
228+
Following the propagation direction of the gradients, we will next implement the backward derivative of FeedForwardLayer. But we're going to do something different. Instead of asking Slang to automatically synthesize the backward autodiff for us, we will provide a custom derivative implementation. Because the network parameters and gradients are a buffer storage declared in the layer, we will have to provide a custom derivative to write the gradient back to the global buffer storage, you can reference [progagate derivative to storage buffer](auto-diff-tutorial-1.md#how-to-propagate-derivatives-to-global-buffer-storage) in last tutorial to refresh your memory. Another reason is that our layer is just matrix multiplication with bias, and its derivative is quite simple, and there are lots of options to even accelerate it with specific hardware (e.g. Nvidia tensor core). Therefore, it's good practice to implement the custom derivative.
229229

230230
First, let's revisit the mathematical formula, given:
231231
$$Z=W\cdot x+b$$
@@ -279,7 +279,7 @@ InterlockedAddF16Emulated(biasesGrad + i, dResult.data[i], originalValue);
279279

280280
In the forward pass we already know that the parameters stored in a global storage buffer is shared by all threads, and so are gradients. Therefore, during backward pass, each thread will accumulate its gradient data to the shared storage buffer, we must use atomic add to accumulate all the gradients without race condition.
281281

282-
The implementation of `outerProductAccumulate` and `matMulTransposed` are just trivial for-loop multiplication, so we will not show the details in this tutorial, the complete code can be found at **[github-link]**.
282+
The implementation of `outerProductAccumulate` and `matMulTransposed` are just trivial for-loop multiplication, so we will not show the details in this tutorial, the complete code can be found at [here](https://github.com/shader-slang/neural-shading-s25/tree/main/hardware-acceleration/mlp-training).
283283

284284
### 4.3 Make the vector differentiable
285285

@@ -370,4 +370,4 @@ void adjustParameters(
370370

371371
The training process will be a loop that alternatively invokes the slang compute kernel `learnGradient` and `adjustParameters`, until the loss converges to an acceptable threshold value.
372372

373-
We will skip the host side implementation for the MLP example, it will be boilerplate code that setup graphics pipeline and allocate buffers for parameters. You can access the [github link] for complete code of this example which includes the host side implementation. Alternatively, you can try to use the more powerful tool [SlangPy](https://github.com/shader-slang/slangpy) to run this MLP example without writing any graphics boilerplate code.
373+
We will skip the host side implementation for the MLP example, it will be boilerplate code that setup graphics pipeline and allocate buffers for parameters. You can access the [github repo](https://github.com/shader-slang/neural-shading-s25/tree/main/hardware-acceleration/mlp-training) for complete code of this example which includes the host side implementation. Alternatively, you can try to use the more powerful tool [SlangPy](https://github.com/shader-slang/slangpy) to run this MLP example without writing any graphics boilerplate code.

0 commit comments

Comments
 (0)