[LTC] Fail to run testcase of latest lazy_tensor_core branch #65465

leslie-fang-intel · 2021-09-22T13:40:12Z

I have built latest lazy_tensor_core branch with commit: 7f3d592
After that, I find it fails my test case in ltm.mark_step():

import torch
import torch.nn as nn
import copy
import time
import lazy_tensor_core
import lazy_tensor_core.core.lazy_model as ltm

lazy_tensor_core._LAZYC._ltc_init_ts_backend()

class SimpleNet(torch.nn.Module):
    def __init__(self):
        super(SimpleNet, self).__init__()
        self.conv = torch.nn.Conv2d(64, 128, (3, 3), stride=(2, 2), padding=(1, 1), bias=False)
        self.conv2 = torch.nn.Conv2d(128, 128, (3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        self.conv3 = torch.nn.Conv2d(64, 128, (1, 1), stride=(2, 2), padding=(0, 0), bias=False)

    def forward(self, x):
        x1 = self.conv(x)
        y1 = self.conv2(x1)
        y2 = self.conv3(x)
        y = y1 + y2
        y = torch.flatten(y, start_dim=1)
        return y
lazy_device = ltm.lazy_device()

model = SimpleNet()
model.train().to(lazy_device)

x = torch.rand(64, 64, 3, 3, requires_grad=True).to(lazy_device)
y = model(x)
yg = torch.rand(64, 512).to(lazy_device)
loss = nn.CrossEntropyLoss()
output = loss(y, yg)
output.backward()

ltm.mark_step()

@wconstab @alanwaketan Could you help to take a look?
Here is the detail fail message:

Traceback (most recent call last):
  File "test_lazy.py", line 46, in <module>
    ltm.mark_step()
  File "/home/lesliefang/pytorch_1_7_1/lazy_tensor/pytorch/lazy_tensor_core/lazy_tensor_core/core/lazy_model.py", line 727, in mark_step
    wait=xu.getenv_as('LTC_SYNC_WAIT', bool, False))
ValueError: stoi

The text was updated successfully, but these errors were encountered:

alanwaketan · 2021-09-22T17:18:48Z

Yup, absolutely love to investigate what's going on.

leslie-fang-intel · 2021-09-24T04:45:55Z

@alanwaketan Thanks for taking a look of this issue. I suspect this issue may relate with GCC or LibC version. If I switch to another system(CentOS Linux release 8.4.2105) with GCC 8.4.1. I can't reproduce this issue.

ezyang added lazy Lazy Tensor work items module: lazy triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Sep 22, 2021

alanwaketan self-assigned this Sep 22, 2021

alanwaketan added this to To do in Lazy Tensor Core via automation Sep 22, 2021

alanwaketan moved this from To do to In progress in Lazy Tensor Core Sep 22, 2021

alanwaketan moved this from In progress to To do in Lazy Tensor Core Sep 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LTC] Fail to run testcase of latest lazy_tensor_core branch #65465

[LTC] Fail to run testcase of latest lazy_tensor_core branch #65465

leslie-fang-intel commented Sep 22, 2021 •

edited

alanwaketan commented Sep 22, 2021

leslie-fang-intel commented Sep 24, 2021

[LTC] Fail to run testcase of latest lazy_tensor_core branch #65465

[LTC] Fail to run testcase of latest lazy_tensor_core branch #65465

Comments

leslie-fang-intel commented Sep 22, 2021 • edited

alanwaketan commented Sep 22, 2021

leslie-fang-intel commented Sep 24, 2021

leslie-fang-intel commented Sep 22, 2021 •

edited