Regarding the performance of tensorwise

I found out that `tensorwise` actually just runs a for loop over the nested tensors. I benchmarked `tensorwise` against `map`, list comprehension and for loop. (Un)surprisingly, `tensorwise` performs much slower than the others. Here is the benchmark

```
import torch as T
import nestedtensor as nt

crit = lambda x, y: T.mean((x - y) ** 2)


@nt.tensorwise()
def loss_nt(a, b):
    return crit(a, b)


def loss_map(a, b):
    return sum(map(crit, a, b)) / len(a)


def loss_for(a, b):
    return sum([crit(a_, b_) for a_, b_ in zip(a, b)]) / len(a)


def loss_expfor(a, b):
    loss = []
    for a_, b_ in zip(a, b):
        loss.append(crit(a_, b_))
    return sum(loss) / len(loss)


p1 = T.arange(64 * 5000 * 3).cuda().view(64, 5000, 3).float()
p2 = T.arange(64 * 5000 * 3).cuda().view(64, 5000, 3).float()

p1_list = list(p1[:, None])
p2_list = list(p2[:, None])

p1_nt = nt.as_nested_tensor(p1_list).cuda()
p2_nt = nt.as_nested_tensor(p2_list).cuda()

start = T.cuda.Event(enable_timing=True)
end = T.cuda.Event(enable_timing=True)

for i in range(100):
    start.record()
    loss_nt(p1_nt, p2_nt)
    end.record()
    T.cuda.synchronize()
    total_nt = start.elapsed_time(end)

    start.record()
    loss_map(p1_list, p2_list)
    end.record()
    T.cuda.synchronize()
    total_map = start.elapsed_time(end)

    start.record()
    loss_for(p1_list, p2_list)
    end.record()
    T.cuda.synchronize()
    total_for = start.elapsed_time(end)

    start.record()
    crit(p1, p2)
    end.record()
    T.cuda.synchronize()
    total = start.elapsed_time(end)

    start.record()
    loss_expfor(p1_list, p2_list)
    end.record()
    T.cuda.synchronize()
    total_expfor = start.elapsed_time(end)

    print(i, total_nt, total_map, total_for, total_expfor, total)
```
Is it because `tensorwise` is not in C++ yet?
If the implementation of `tensorwise` is final then I wonder if tensorwise is just for convenience, not for performance?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Regarding the performance of tensorwise #17

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Regarding the performance of tensorwise #17

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions