Skip to content
This repository was archived by the owner on Apr 28, 2023. It is now read-only.

Conversation

@nicolasvasilache
Copy link
Contributor

This introduces a coding style guide for TC and addresses #109.
It then goes on to update various places in the codebase where TCs are hard coded.

@nicolasvasilache
Copy link
Contributor Author

@caffe2bot retest this please

Lowering such operations efficient from TC is the subject of future work.

Prefix gradient tensors names with :code:`g_`
---------------------------------------------
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can of course make AD follow this pattern, however, provided we disallow mutating inputs, this is not the whole story. Consider a variant of matmul that also has a second output which is the doubled result:

def mm(double(M, K) A, double(K, N) B) -> (O, W) {
  O(i, j) +=! A(i, k) * B(k, j)
  W(i, j) = O(i, j) * 2
}

So far so good, however if you want to apply reverse mode AD to it, you will need to add gradients w.r.t. W to the gradient w.r.t. O:

def grad_mm(double(M, K) A, double(K, N) B,
            double(M, N) O, double(M, N) W,
            double(M, N) seed_d_O, double(M, N) seed_d_W) -> (d_A, d_B) {
  d_O(i, j) = seed_d_d_O(i, j)
  d_O(i, j) += (seed_d_W(i, j) * 2)
  d_A(i, k) = 0
  d_A(i, k) += (d_O(i, j) * B(k, j))
  d_B(k, j) = 0
  d_B(k, j) += (d_O(i, j) * A(i, k))
}

Here, you can see that the arguments are gradient "seeds" (this is the name used sometimes in the literature IIRC), and might not reflect the real gradients of a particular value alone, as can be seen in the case of O. On the other hand, is true that the seed is the real gradient of W, which is why my code skips instantiating d_W, and uses the seed alone.

If you're writing down the conventions for gradient TCs then it would be good to take that into account. Otherwise, if you're fine with what I did for AD, I can just call them seed_g_<name>.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer grad_ over g_. Otherwise it smells like the Hungarian notation, and that is a questionable choice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about @apaszke's d_? It is the actual mathematical notation so that should settle it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@apaszke the seed stuff lgtm, no need to change IMO

C(m, n) +=! A(m, k) * B(k, n)
}
Filter non-rectangular regions with deta-dependencies
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data

The reader may remark that this is an inefficient way of performing
matrix-multiplication of triangular matrices.
Lowering such operations efficient from TC is the subject of future work.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

efficiently

ftynse
ftynse previously requested changes Mar 28, 2018

Please see the following documentation
[entry](https://facebookresearch.github.io/TensorComprehensions/coding_conventions.html)
on how to write Tensor Comprehensions in a standard, legible, fashion.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd remove commas here.

stores into `o` will reduce over `j` with the reduction specified for the loop.
The statement `o(r) +=! A(r,r_c) * x(r_c)` introduces two index variables `r` and `r_c`.
Their range is inferred by their use indexing `A` and `x`. `r = [0,R)`, `r_c = [0,C)`.
Because `r_c` only appears on the right side,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right hand side?


In order to increase readability across Tensor Comprehensions written by
multiple authors and to reduce the amount of surprising behavior, the
following conventions should be adopted when writing TC. Generally in TC one
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comma after "Generally in TC"

In order to increase readability across Tensor Comprehensions written by
multiple authors and to reduce the amount of surprising behavior, the
following conventions should be adopted when writing TC. Generally in TC one
should increment nesting by 4 whitespaces at each level and align tensor names
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mention that we use spaces and not tabs
(never miss an opportunity to start a holy war)

multiple authors and to reduce the amount of surprising behavior, the
following conventions should be adopted when writing TC. Generally in TC one
should increment nesting by 4 whitespaces at each level and align tensor names
and indices where appropriate to make memory access patterns emerge. Since
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this mean that inline whitespace is encouraged? mention this specifically?

Is this example what you want? Do we encourage inline whitespace on both LHS and RHS?

A(i,j+1) += B(i, j   , k)
C(i,j)   += B(i, j-1, k)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unspecified, use your best judgement, I'd certainly not limit whitespacing to only RHS.

LU(m1, m2) +=! L(m1, r_k) * U(r_k, m2) where r_k in m1:M, r_k in 0:m2+1
}
The reader may remark that this is an inefficient way of performing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why inefficient? How this is relevant to the style? A code can perfectly respect the coding guidelines yet remain inefficient...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above, please let me know if you want me to move to another section in the docs

Lowering such operations efficient from TC is the subject of future work.

Prefix gradient tensors names with :code:`g_`
---------------------------------------------
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer grad_ over g_. Otherwise it smells like the Hungarian notation, and that is a questionable choice.

---------------------------------------------

When implementing backward operations, pass the inputs to the backwards pass
in the same order as the outputs to the forward passs and use the same tensor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

outputs of the forward pass?

...
}
def conv_bw(float(N,C,H,W) I, float(M,C,KH,KW) Wt, float(N,M,HO,WO) g_O) -> (g_I) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be called g_conv as per the previous item?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no particular request on the function name on my end

I_grad(n, c, h, w) +=! O_grad(n, m, {sh} * h - kh, {sw} * w - kw) * W1(m, c, kh, kw)
W1_grad(m, c, kh, kw) +=! O_grad(n, m, {sh} * h - kh, {sw} * w - kw) * I(n, c, h, w)
def convolution_grad(float(N,C,H,W) I, float(M,C,KH,KW) W1, float(N,M,H,W) g_O) -> (g_I, g_W1) {{
g_I(n, c, h, w) +=! g_O( n, r_m, {sh} * h - r_kh, {sw} * w - r_kw) * W1(r_m, c, r_kh, r_kw)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we never said that we prefer aligning

  X(i, j)
FOO(m,n)

over

X  (i,j)
FOO(i,j)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we didn't, use your best judgement, do you see an issue with this alignment?

@nicolasvasilache nicolasvasilache merged commit a401619 into master Apr 4, 2018
@ftynse ftynse deleted the pr/tc-coding-guide branch April 5, 2018 11:38
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants