Nested Function Calls in TC #56

prigoyal · 2018-02-23T00:56:23Z

There is some interest in being able to do nested function calls i.e. define one TC and call it from inside another one.

lang = """
def max() {
...
}
"""

lang1 = """
def elementwise() -> (output) {
    output = max()
}
"""

This makes a lot of sense as well for cases where you want to use a defined database of TC and just make calls.

[EDIT]: pasting here @leonbottou 's remarks:

I'd love to see the ability to invoke a TC from another. Not as a procedure call, but as type-aware inlining. Here is an example for a possible syntax:
def weirdbatchedmatmul( float(N,W,H) x, float(N,M,W) k )
    -> float(M,W,H) y 
{
  y(*,i,j) = matmul( x(*,i,j), k(*,*,i) )
}

Then we can reimplement select and narrow
def select0( float(A,B,C) x, int S) -> o {
   o(i,j) = x(S,i,j)
}
def narrow0( float(A,B,C) x, int L, int F) -> o {
   o(k,i,j) = x(k+F,i,j) where k in 0:L-1
}

**Benefits:**
- Making the TC language a little less write-only!
- Bridging the gap between the TC style and the Lush/Torch/TF/PyTorch style of tensor programming,
- Encapsulating layers into a library of ready-to-inline TCs.

The text was updated successfully, but these errors were encountered:

prigoyal · 2018-03-09T02:03:38Z

also came through #132

prigoyal · 2018-05-08T19:24:16Z

Hi everyone (cc @zdevito for language perspective input), I have been thinking about this problem and looking at the code. I wrote down a proposal for it: https://gist.github.com/prigoyal/3c411fcf994c0069a11f9df0733e70f2 and would appreciate any feedback/comments

also cc @salexspb

ftynse · 2018-05-08T20:22:18Z

Why don't you paste the content of that gist in the discussion? It's much better to keep everything in a single place.

prigoyal · 2018-05-08T20:57:03Z

Design: Support for Nested Function calls in TC

WIP: thoughts, comments welcome

Let's say we have two layers: fully connected and relu layer that we would like to fuse together. We express the fully connected layer and Relu layer in TC as below:

def fully_connected(float(B, M) I, float(N, M) W1, float(N) B1) -> (O1) {
    O1(b, n) +=! I(b, m) * W1(n, m)
    O1(b, n) = O1(b, n) + B1(n)
}

def relu(float(B, M) I) -> (O1) {
    O1(b, n) *=* fmax(I(b, n), 0)
}

Currently, we write fused Fully connected - relu layer as below:

def fcrelu(float(B,M) I, float(N,M) W1, float(N) B1) -> (O1) {
    O1(b, n) +=! I(b, m) * W1(n, m)
    O1(b, n) = O1(b, n) + B1(n)
    O1(b, n) = fmax(O1(b, n), 0)
}

This means that TC is write-only language i.e. we have to explicitly express the operations by writing them irrespective of fusion or not. It would be nice if we could just write:

def fcrelu(float(B,M) I, float(N,M) W1, float(N) B1) -> (O1) {
    O1(b, n) = fully_connected(I(b,m), W1(n,m), B1(n))
    O1(b, n) = relu(O1(b, n), 0)
}

OR

def fcrelu(float(B,M) I, float(N,M) W1, float(N) B1) -> (O1) {
    O1(b, n) = relu(fully_connected(I(b,m), W1(n,m), B1(n)))
}

The benefits of doing this are:

Make the TC language less write-only and make it easy for people to write operations.
Bridging the gap between TC style and Lush/Torch/PyTorch/TF style of tensor operations.
Encapsulating layers into library of ready-to-inline TCs.
PyTorch JIT can call TC operations to fuse operations.

In order to achieve above, there are two approaches:

string inlining by formatting the strings
manipulate the ASTs for the TC defs.

While 1) is an easy approach, it is also a hack and can get ugly, prone to errors. But 2) is the most natural way to handle this properly. Let's see how to solve this with 2). We will consider the TC:

def fcrelu(float(B,M) I, float(N,M) W1, float(N) B1) -> (O1) {
    O1(b, n) = fully_connected(I(b,m), W1(n,m), B1(n))
    O1(b, n) = relu(O1(b, n), 0)
}

Create a special Token Type, let's call it TK_CUSTOM_TYPE. A node is a marked as custom type if it makes function calls to other TCs. The node view is similar to node type TK_BUILT_IN (name, arguments, type) except that the calls are made to some TC operation. Further, we store the information returns which is the list of return outputs from this comprehension.
In sema, we want to replace TK_CUSTOM_TYPE node with the list of TK_APPLY nodes. Further, the comprehension with TK_CUSTOM_TYPE will become list of comprehensions which should then be appended to the original comprehension list.
To convert the TK_CUSTOM_TYPE, retrieve the TC def of the custom type from some source (libraries.h for example). We parse that TC def but replace the names of params, idents by the arguments idents to TC_CUSTOM_TYPE node. Similarly for the output. We get the list of comprehensions of this parsed function and append that list to the Def node.

skimo-openhub · 2018-05-08T21:00:54Z

On Tue, May 08, 2018 at 08:57:07PM +0000, Priya Goyal wrote: # Design: Support for Nested Function calls in TC **WIP**: thoughts, comments welcome Let's say we have two layers: fully connected and relu layer that we would like to fuse together. We express the fully connected layer and Relu layer in TC as below: ```python def fully_connected(float(B, M) I, float(N, M) W1, float(N) B1) -> (O1) { O1(b, n) +=! I(b, m) * W1(n, m) O1(b, n) = O1(b, n) + B1(n) } def relu(float(B, M) I) -> (O1) { O1(b, n) *=* fmax(I(b, n), 0) } ``` Currently, we write fused Fully connected - relu layer as below: ```python def fcrelu(float(B,M) I, float(N,M) W1, float(N) B1) -> (O1) { O1(b, n) +=! I(b, m) * W1(n, m) O1(b, n) = O1(b, n) + B1(n) O1(b, n) = fmax(O1(b, n), 0) } ``` This means that TC is write-only language i.e. we have to explicitly express the operations by writing them irrespective of fusion or not. It would be nice if we could just write: ```python def fcrelu(float(B,M) I, float(N,M) W1, float(N) B1) -> (O1) { O1(b, n) = fully_connected(I(b,m), W1(n,m), B1(n))

This looks as if `fully_connected` is performing a _scalar_ operation. I would expect a call like `O = fully_connected(I, W1, B1)` for the `fully_connected` defined above. skimo

ftynse · 2018-05-08T21:22:17Z

Actually, this O1(b, n) = relu(O1(b, n), 0) too looks like a scalar operation. It applies relu to the element of O1 at (b,n)

ftynse · 2018-05-08T21:28:09Z

This is a very useful feature indeed. A couple of comments.

This means that TC is write-only language

We have a different definition of write-only, so let's not abuse the term :) TC does not supports reuse of functions natively (favoring code copy-pasting), but it is designed to be readable, which is the opposite of write-only for me.

Bridging the gap between TC style and Lush/Torch/PyTorch/TF style of tensor operations.

I'd need to see example of that style. We don't want TC to end up as yet another framework.

string inlining by formatting the strings

We claim we are doing better than other tools because we don't do string stitching :)

Create a special Token Type, let's call it TK_CUSTOM_TYPE.

Yep, let's call it TK_CALL and transform some TK_APPLY instances to it.

retrieve the TC def of the custom type from some source

This is mostly irrelevant here. Assume a program that has all the relevant definitions lexically before the first calls. We can think about modules and libraries once the base version works.

And a couple of questions:

what happens if the function you call has multiple outputs, some of which may be temporaries ? you can only assign one tensor in one expression.
how do you apply a function to a slice of a tensor?
why iteration variables appear in the calls at all, if calls operate on entire tensors ?
what happens in expressions like O(i,j) = A(i,j) + matmul(B(i,k),C(k,j))? you cannot just replace that by a list in the AST, there expected type is expression rather than list at the point where matmul is called.
relu(fully_connected(I(b,m), W1(n,m), B1(n))) how is this different from fabs(fully_connected(I(b,m), W1(n,m), B1(n))) syntactically or semantically? note that fabs is applied pointwise while relu is not.
how do you handle recursive calls?

nicolasvasilache · 2018-05-10T04:49:55Z

@ftynse

what happens if the function you call has multiple outputs, some of which may be temporaries ?you can only assign one tensor in one expression.

We should be fine with just implementing temporary support with custom allocator and then have a function that parses and queries the temporary size. We can then allocate scratch space (with the delegated allocator that won't call cudaMalloc) and put temporaries in there. I think @zdevito made this point a few months ago. But we will prob. also need to support things like:

(O1, O2, O3) = xxx();

and have extra support for naming "hidden" temporaries and multiple output / multiple input which will get us into fun error reporting situations..

how do you apply a function to a slice of a tensor?

We just need strided tensor support, a slice of a tensor is a metadata operation. Coming up with good semantics that are not too framework-y is another story. Reusability vs implicit semantics is a tradeoff (think broadcast and reduce semantics .. gahhhh). For one I would much rather lose reusability than see implicit semantics.

why iteration variables appear in the calls at all, if calls operate on entire tensors ?

That's a pretty tricky one .. do we want to support operations on elements, matrices or tensors. Extrapolate, what does it mean to multiply 2 tensors? Then you get in the business of having to name your functions to express their intent in their name (conv2dNCHW) and poof you're back to being a framework and you have lost the niceness of your explicit mathematical language. If we get too extreme in that direction, we might as well do an ONNX -> ISL and save us some trouble ..

what happens in expressions like O(i,j) = A(i,j) + matmul(B(i,k),C(k,j))? you cannot just replace that by a list in the AST, there expected type is expression rather than list at the point where matmul is called.

This one feels like the type of transformation I was mentioning earlier which I would now qualify as expression propagation. It comes with potential tradeoffs between storage and compute complexity. Expressed as is it is also related to function inlining but it has more flavor to it.

relu(fully_connected(I(b,m), W1(n,m), B1(n))) how is this different from fabs(fully_connected(I(b,m), W1(n,m), B1(n))) syntactically or semantically? note that fabs is applied pointwise while relu is not.

Well all the interesting problems about temporaries etc are hidden by the fact that the example chosen is trivial and can be expressed as a simple map combinator. IMO something like fc(relu(fc(I(xxx), W1(xxx), B1(xxx), xxx, xxx)) where I don't know yet what xxx is (if anything) is more interesting.

how do you handle recursive calls?

Don't? Please, no stack on the GPU :p

prigoyal added the feature-request label Feb 23, 2018

prigoyal self-assigned this Feb 23, 2018

prigoyal added this to language in Issue category Mar 6, 2018

prigoyal mentioned this issue Mar 9, 2018

Improved Layer Fusion Support #132

Closed

prigoyal added the in-progress label May 8, 2018

saulshanabrook mentioned this issue May 9, 2018

Try out integration with Tensor Computations xnd-project/libxnd#9

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nested Function Calls in TC #56

Nested Function Calls in TC #56

prigoyal commented Feb 23, 2018 •

edited

prigoyal commented Mar 9, 2018 •

edited

prigoyal commented May 8, 2018 •

edited

ftynse commented May 8, 2018

prigoyal commented May 8, 2018 •

edited

skimo-openhub commented May 8, 2018 via email

ftynse commented May 8, 2018

ftynse commented May 8, 2018

nicolasvasilache commented May 10, 2018

Nested Function Calls in TC #56

Nested Function Calls in TC #56

Comments

prigoyal commented Feb 23, 2018 • edited

prigoyal commented Mar 9, 2018 • edited

prigoyal commented May 8, 2018 • edited

ftynse commented May 8, 2018

prigoyal commented May 8, 2018 • edited

Design: Support for Nested Function calls in TC

skimo-openhub commented May 8, 2018 via email

ftynse commented May 8, 2018

ftynse commented May 8, 2018

nicolasvasilache commented May 10, 2018

prigoyal commented Feb 23, 2018 •

edited

prigoyal commented Mar 9, 2018 •

edited

prigoyal commented May 8, 2018 •

edited

prigoyal commented May 8, 2018 •

edited