-
Notifications
You must be signed in to change notification settings - Fork 213
TC coding guide #215
TC coding guide #215
Conversation
4f47765 to
2a65742
Compare
|
@caffe2bot retest this please |
| Lowering such operations efficient from TC is the subject of future work. | ||
|
|
||
| Prefix gradient tensors names with :code:`g_` | ||
| --------------------------------------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can of course make AD follow this pattern, however, provided we disallow mutating inputs, this is not the whole story. Consider a variant of matmul that also has a second output which is the doubled result:
def mm(double(M, K) A, double(K, N) B) -> (O, W) {
O(i, j) +=! A(i, k) * B(k, j)
W(i, j) = O(i, j) * 2
}
So far so good, however if you want to apply reverse mode AD to it, you will need to add gradients w.r.t. W to the gradient w.r.t. O:
def grad_mm(double(M, K) A, double(K, N) B,
double(M, N) O, double(M, N) W,
double(M, N) seed_d_O, double(M, N) seed_d_W) -> (d_A, d_B) {
d_O(i, j) = seed_d_d_O(i, j)
d_O(i, j) += (seed_d_W(i, j) * 2)
d_A(i, k) = 0
d_A(i, k) += (d_O(i, j) * B(k, j))
d_B(k, j) = 0
d_B(k, j) += (d_O(i, j) * A(i, k))
}
Here, you can see that the arguments are gradient "seeds" (this is the name used sometimes in the literature IIRC), and might not reflect the real gradients of a particular value alone, as can be seen in the case of O. On the other hand, is true that the seed is the real gradient of W, which is why my code skips instantiating d_W, and uses the seed alone.
If you're writing down the conventions for gradient TCs then it would be good to take that into account. Otherwise, if you're fine with what I did for AD, I can just call them seed_g_<name>.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer grad_ over g_. Otherwise it smells like the Hungarian notation, and that is a questionable choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about @apaszke's d_? It is the actual mathematical notation so that should settle it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@apaszke the seed stuff lgtm, no need to change IMO
docs/source/coding_conventions.rst
Outdated
| C(m, n) +=! A(m, k) * B(k, n) | ||
| } | ||
| Filter non-rectangular regions with deta-dependencies |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
data
docs/source/coding_conventions.rst
Outdated
| The reader may remark that this is an inefficient way of performing | ||
| matrix-multiplication of triangular matrices. | ||
| Lowering such operations efficient from TC is the subject of future work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
efficiently
CodingConventions.md
Outdated
|
|
||
| Please see the following documentation | ||
| [entry](https://facebookresearch.github.io/TensorComprehensions/coding_conventions.html) | ||
| on how to write Tensor Comprehensions in a standard, legible, fashion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd remove commas here.
docs/doxygen/index.md
Outdated
| stores into `o` will reduce over `j` with the reduction specified for the loop. | ||
| The statement `o(r) +=! A(r,r_c) * x(r_c)` introduces two index variables `r` and `r_c`. | ||
| Their range is inferred by their use indexing `A` and `x`. `r = [0,R)`, `r_c = [0,C)`. | ||
| Because `r_c` only appears on the right side, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right hand side?
docs/source/coding_conventions.rst
Outdated
|
|
||
| In order to increase readability across Tensor Comprehensions written by | ||
| multiple authors and to reduce the amount of surprising behavior, the | ||
| following conventions should be adopted when writing TC. Generally in TC one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comma after "Generally in TC"
| In order to increase readability across Tensor Comprehensions written by | ||
| multiple authors and to reduce the amount of surprising behavior, the | ||
| following conventions should be adopted when writing TC. Generally in TC one | ||
| should increment nesting by 4 whitespaces at each level and align tensor names |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mention that we use spaces and not tabs
(never miss an opportunity to start a holy war)
| multiple authors and to reduce the amount of surprising behavior, the | ||
| following conventions should be adopted when writing TC. Generally in TC one | ||
| should increment nesting by 4 whitespaces at each level and align tensor names | ||
| and indices where appropriate to make memory access patterns emerge. Since |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this mean that inline whitespace is encouraged? mention this specifically?
Is this example what you want? Do we encourage inline whitespace on both LHS and RHS?
A(i,j+1) += B(i, j , k)
C(i,j) += B(i, j-1, k)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unspecified, use your best judgement, I'd certainly not limit whitespacing to only RHS.
docs/source/coding_conventions.rst
Outdated
| LU(m1, m2) +=! L(m1, r_k) * U(r_k, m2) where r_k in m1:M, r_k in 0:m2+1 | ||
| } | ||
| The reader may remark that this is an inefficient way of performing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why inefficient? How this is relevant to the style? A code can perfectly respect the coding guidelines yet remain inefficient...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as above, please let me know if you want me to move to another section in the docs
| Lowering such operations efficient from TC is the subject of future work. | ||
|
|
||
| Prefix gradient tensors names with :code:`g_` | ||
| --------------------------------------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer grad_ over g_. Otherwise it smells like the Hungarian notation, and that is a questionable choice.
docs/source/coding_conventions.rst
Outdated
| --------------------------------------------- | ||
|
|
||
| When implementing backward operations, pass the inputs to the backwards pass | ||
| in the same order as the outputs to the forward passs and use the same tensor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
outputs of the forward pass?
docs/source/coding_conventions.rst
Outdated
| ... | ||
| } | ||
| def conv_bw(float(N,C,H,W) I, float(M,C,KH,KW) Wt, float(N,M,HO,WO) g_O) -> (g_I) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be called g_conv as per the previous item?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no particular request on the function name on my end
| I_grad(n, c, h, w) +=! O_grad(n, m, {sh} * h - kh, {sw} * w - kw) * W1(m, c, kh, kw) | ||
| W1_grad(m, c, kh, kw) +=! O_grad(n, m, {sh} * h - kh, {sw} * w - kw) * I(n, c, h, w) | ||
| def convolution_grad(float(N,C,H,W) I, float(M,C,KH,KW) W1, float(N,M,H,W) g_O) -> (g_I, g_W1) {{ | ||
| g_I(n, c, h, w) +=! g_O( n, r_m, {sh} * h - r_kh, {sw} * w - r_kw) * W1(r_m, c, r_kh, r_kw) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we never said that we prefer aligning
X(i, j)
FOO(m,n)
over
X (i,j)
FOO(i,j)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we didn't, use your best judgement, do you see an issue with this alignment?
2a65742 to
37bf9bc
Compare
This introduces a coding style guide for TC and addresses #109.
It then goes on to update various places in the codebase where TCs are hard coded.