[LTC] Setup a subset of TorchBench for the current implementation #65386

alanwaketan · 2021-09-21T01:41:55Z

This is the umbrella to keep track of the status of setting up a subset of TorchBench for the current implementation. For more detailed information, please refer to this document.

Objectives of this task includes:

Setup a testbed for prototyping. (AWS GPU Cluster)
Setup on-demand PR CI for the actual low-noise A/B testing. ([LTC] Turn on LTC on TorchBench on-demand PR CI #64863, #456)
Collect a subset. (Initial Benchmark for LTC)
Run the subset with CPU lazy device and figure out some numbers. ([LTC] Turn on LTC on TorchBench on-demand PR CI #64863)
Run the subset with CPU lazy device and figure out missing operators.
[Optional] Enhance TorchBench with LTC metrics.

CC @wconstab.

alanwaketan · 2021-09-23T09:03:18Z

To figure out missing operators, we can use lazy_tensor_core.debug.metrics.counter_names().

Here is an example code snippet:

import lazy_tensor_core.debug.metrics as metrics
// some calculations...
print(metrics.counter_names())

Here is an example output:

['CachedCompile', 'CreateLtcTensor', 'DestroyLtcTensor', 'DeviceDataCacheMiss', 'UncachedCompile', 'aten::_local_scalar_dense', 'aten::abs.out', 'aten::bitwise_and.Tensor_out', 'aten::ceil.out', 'aten::masked_select', 'aten::max', 'aten::min', 'aten::normal_', 'lazy::_copy_from', 'lazy::_copy_from_and_resize', 'lazy::add', 'lazy::div', 'lazy::eq', 'lazy::gt', 'lazy::lt', 'lazy::mul', 'lazy::ne', 'lazy::select', 'lazy::view']

The ones with 'aten::' prefix are the missing ones.

alanwaketan · 2021-09-23T19:41:48Z

Looks like for hf_Bert, all operators have been lowered to TS IR. I will update the PR in TorchBench #456 to show the steps.

hf_Bert:

eval:
['CreateLtcTensor', 'DestroyLtcTensor', 'DeviceDataCacheMiss', 'MarkStep', 'UncachedCompile',
'lazy::_copy_from',
'lazy::_softmax',
'lazy::add',
'lazy::bmm',
'lazy::div',
'lazy::expand',
'lazy::fill_',
'lazy::gelu',
'lazy::index_select',
'lazy::mm',
'lazy::mul',
'lazy::native_batch_norm',
'lazy::permute',
'lazy::slice',
'lazy::sub',
'lazy::t',
'lazy::transpose',
'lazy::unsqueeze',
'lazy::view',
'lazy::zero_']

train:
['CreateLtcTensor', 'DestroyLtcTensor', 'DeviceDataCacheMiss', 'MarkStep', 'UncachedCompile',
'lazy::_copy_from',
'lazy::_log_softmax',
'lazy::_log_softmax_backward_data',
'lazy::_softmax',
'lazy::_softmax_backward_data',
'lazy::add',
'lazy::addcdiv_',
'lazy::bernoulli_',
'lazy::bmm',
'lazy::div',
'lazy::embedding_dense_backward',
'lazy::expand',
'lazy::fill_',
'lazy::gelu',
'lazy::gelu_backward',
'lazy::index_select',
'lazy::mm',
'lazy::mul',
'lazy::native_batch_norm',
'lazy::native_batch_norm_backward',
'lazy::nll_loss_backward',
'lazy::nll_loss_forward',
'lazy::permute',
'lazy::slice',
'lazy::sqrt',
'lazy::sub',
'lazy::sum',
'lazy::t',
'lazy::transpose',
'lazy::unsqueeze',
'lazy::view',
'lazy::zero_']

alanwaketan · 2021-09-23T20:09:00Z

Here are the results from resnet50, which does have a few operators need to be lowered:

resnet50:

eval:
['CachedCompile', 'CreateLtcTensor', 'DestroyLtcTensor', 'MarkStep', 'UncachedCompile',
'aten::convolution_overrideable',
'aten::max_pool2d_with_indices',
'aten::mean.out',
'lazy::_copy_from',
'lazy::_copy_from_and_resize',
'lazy::add',
'lazy::addmm',
'lazy::native_batch_norm',
'lazy::relu_',
'lazy::t',
'lazy::view']

train:
['CachedCompile', 'CreateLtcTensor', 'DestroyLtcTensor', 'DeviceDataCacheMiss', 'MarkStep', 'UncachedCompile',
'aten::convolution_backward_overrideable',
'aten::convolution_overrideable',
'aten::max_pool2d_with_indices',
'aten::max_pool2d_with_indices_backward',
'aten::mean.out',
'aten::random_.to',
'lazy::_copy_from',
'lazy::_copy_from_and_resize',
'lazy::_log_softmax',
'lazy::_log_softmax_backward_data',
'lazy::add',
'lazy::addcdiv_',
'lazy::addmm',
'lazy::div',
'lazy::expand',
'lazy::fill_',
'lazy::mm',
'lazy::mul',
'lazy::native_batch_norm',
'lazy::native_batch_norm_backward',
'lazy::nll_loss_backward',
'lazy::nll_loss_forward',
'lazy::relu_',
'lazy::sqrt',
'lazy::sum',
'lazy::t',
'lazy::threshold_backward',
'lazy::view',
'lazy::zero_']

The lowering work is tracked by #65576.

alanwaketan · 2021-09-23T20:23:05Z

The last item 'Enhance TorchBench with LTC metrics' is an optional task. I will revisit that once all other higher priority tasks are completed.

alanwaketan created this issue from a note in Lazy Tensor Core (To do) Sep 21, 2021

alanwaketan moved this from To do to In progress in Lazy Tensor Core Sep 21, 2021

alanwaketan self-assigned this Sep 21, 2021

alanwaketan added the module: lazy label Sep 21, 2021

ezyang added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Sep 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LTC] Setup a subset of TorchBench for the current implementation #65386

[LTC] Setup a subset of TorchBench for the current implementation #65386

alanwaketan commented Sep 21, 2021 •

edited

alanwaketan commented Sep 23, 2021 •

edited

alanwaketan commented Sep 23, 2021

alanwaketan commented Sep 23, 2021 •

edited

alanwaketan commented Sep 23, 2021

[LTC] Setup a subset of TorchBench for the current implementation #65386

[LTC] Setup a subset of TorchBench for the current implementation #65386

Comments

alanwaketan commented Sep 21, 2021 • edited

alanwaketan commented Sep 23, 2021 • edited

alanwaketan commented Sep 23, 2021

alanwaketan commented Sep 23, 2021 • edited

alanwaketan commented Sep 23, 2021

alanwaketan commented Sep 21, 2021 •

edited

alanwaketan commented Sep 23, 2021 •

edited

alanwaketan commented Sep 23, 2021 •

edited