-
Notifications
You must be signed in to change notification settings - Fork 232
Nested Function Calls in TC #56
Comments
also came through #132 |
Hi everyone (cc @zdevito for language perspective input), I have been thinking about this problem and looking at the code. I wrote down a proposal for it: https://gist.github.com/prigoyal/3c411fcf994c0069a11f9df0733e70f2 and would appreciate any feedback/comments also cc @salexspb |
Why don't you paste the content of that gist in the discussion? It's much better to keep everything in a single place. |
Design: Support for Nested Function calls in TCWIP: thoughts, comments welcome Let's say we have two layers: fully connected and relu layer that we would like to fuse together. We express the fully connected layer and Relu layer in TC as below: def fully_connected(float(B, M) I, float(N, M) W1, float(N) B1) -> (O1) {
O1(b, n) +=! I(b, m) * W1(n, m)
O1(b, n) = O1(b, n) + B1(n)
}
def relu(float(B, M) I) -> (O1) {
O1(b, n) *=* fmax(I(b, n), 0)
} Currently, we write fused Fully connected - relu layer as below: def fcrelu(float(B,M) I, float(N,M) W1, float(N) B1) -> (O1) {
O1(b, n) +=! I(b, m) * W1(n, m)
O1(b, n) = O1(b, n) + B1(n)
O1(b, n) = fmax(O1(b, n), 0)
} This means that TC is write-only language i.e. we have to explicitly express the operations by writing them irrespective of fusion or not. It would be nice if we could just write: def fcrelu(float(B,M) I, float(N,M) W1, float(N) B1) -> (O1) {
O1(b, n) = fully_connected(I(b,m), W1(n,m), B1(n))
O1(b, n) = relu(O1(b, n), 0)
}
OR
def fcrelu(float(B,M) I, float(N,M) W1, float(N) B1) -> (O1) {
O1(b, n) = relu(fully_connected(I(b,m), W1(n,m), B1(n)))
} The benefits of doing this are:
In order to achieve above, there are two approaches:
While 1) is an easy approach, it is also a hack and can get ugly, prone to errors. But 2) is the most natural way to handle this properly. Let's see how to solve this with 2). We will consider the TC: def fcrelu(float(B,M) I, float(N,M) W1, float(N) B1) -> (O1) {
O1(b, n) = fully_connected(I(b,m), W1(n,m), B1(n))
O1(b, n) = relu(O1(b, n), 0)
}
|
On Tue, May 08, 2018 at 08:57:07PM +0000, Priya Goyal wrote:
# Design: Support for Nested Function calls in TC
**WIP**: thoughts, comments welcome
Let's say we have two layers: fully connected and relu layer that we would like to fuse together. We express the fully connected layer and Relu layer in TC as below:
```python
def fully_connected(float(B, M) I, float(N, M) W1, float(N) B1) -> (O1) {
O1(b, n) +=! I(b, m) * W1(n, m)
O1(b, n) = O1(b, n) + B1(n)
}
def relu(float(B, M) I) -> (O1) {
O1(b, n) *=* fmax(I(b, n), 0)
}
```
Currently, we write fused Fully connected - relu layer as below:
```python
def fcrelu(float(B,M) I, float(N,M) W1, float(N) B1) -> (O1) {
O1(b, n) +=! I(b, m) * W1(n, m)
O1(b, n) = O1(b, n) + B1(n)
O1(b, n) = fmax(O1(b, n), 0)
}
```
This means that TC is write-only language i.e. we have to explicitly express the operations by writing them irrespective of fusion or not. It would be nice if we could just write:
```python
def fcrelu(float(B,M) I, float(N,M) W1, float(N) B1) -> (O1) {
O1(b, n) = fully_connected(I(b,m), W1(n,m), B1(n))
This looks as if `fully_connected` is performing a _scalar_ operation.
I would expect a call like `O = fully_connected(I, W1, B1)`
for the `fully_connected` defined above.
skimo
|
Actually, this |
This is a very useful feature indeed. A couple of comments.
We have a different definition of write-only, so let's not abuse the term :) TC does not supports reuse of functions natively (favoring code copy-pasting), but it is designed to be readable, which is the opposite of write-only for me.
I'd need to see example of that style. We don't want TC to end up as yet another framework.
We claim we are doing better than other tools because we don't do string stitching :)
Yep, let's call it
This is mostly irrelevant here. Assume a program that has all the relevant definitions lexically before the first calls. We can think about modules and libraries once the base version works. And a couple of questions:
|
We should be fine with just implementing temporary support with custom allocator and then have a function that parses and queries the temporary size. We can then allocate scratch space (with the delegated allocator that won't call cudaMalloc) and put temporaries in there. I think @zdevito made this point a few months ago. But we will prob. also need to support things like:
and have extra support for naming "hidden" temporaries and multiple output / multiple input which will get us into fun error reporting situations..
We just need strided tensor support, a slice of a tensor is a metadata operation. Coming up with good semantics that are not too framework-y is another story. Reusability vs implicit semantics is a tradeoff (think broadcast and reduce semantics .. gahhhh). For one I would much rather lose reusability than see implicit semantics.
That's a pretty tricky one .. do we want to support operations on elements, matrices or tensors. Extrapolate, what does it mean to multiply 2 tensors? Then you get in the business of having to name your functions to express their intent in their name (conv2dNCHW) and poof you're back to being a framework and you have lost the niceness of your explicit mathematical language. If we get too extreme in that direction, we might as well do an ONNX -> ISL and save us some trouble ..
This one feels like the type of transformation I was mentioning earlier which I would now qualify as expression propagation. It comes with potential tradeoffs between storage and compute complexity. Expressed as is it is also related to function inlining but it has more flavor to it.
Well all the interesting problems about temporaries etc are hidden by the fact that the example chosen is trivial and can be expressed as a simple map combinator. IMO something like
Don't? Please, no stack on the GPU :p |
There is some interest in being able to do nested function calls i.e. define one TC and call it from inside another one.
This makes a lot of sense as well for cases where you want to use a defined database of TC and just make calls.
[EDIT]: pasting here @leonbottou 's remarks:
The text was updated successfully, but these errors were encountered: